
MICROCOPY RESOLUTION TEST CHART 
NATIONAL BUREAU OF STANDARDS 
STANDARD REFERENCE MATERIAL 1010a 
(ANSI and ISO TEST CHART No 2) 



DOCUMENT RESUME 



ED 247 895 

AUTHOR 
TITLE 

INSTITUTION 

REPORT NO 
PUB DATE 
NOTE 

AVAILABLE FROM 

PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



IR Oil 244 



Gallagher, Leonard J.; Draper, Jesse M. 

Guide' on Data Models in the Selection and Use of 

Database Management Systems. Final Report. 

National Bureau of Standards (DOC), Washington, D.C. 

Inst, for Computer Sciences and Technology. 

NBS-SP-500-108 

Jan 84 ^ / , 

75p. , 

Superintendent of Documents, U.S. > Government Printing 
Office, Washington, DC 20402. 
Guides - General" (050) 

MF01/PC03 Plus Postage. 

^Computer Software; ^Databases; Information 
Retrieval; ^Information Storage; ^Models; Programing; 
^Programing Languages; ^Standards 
o*Database Management Systems;' *Data Models; Network 
Models; Relational Models 



ABSTRACT 

A tutorial introduction to data ncodels in general is 
provided, with particular emphasis on the relational and net%;ork 
models defined by the two proposed ANSI (American National Standards 
Institute) database language standards. Examples based on the network 
and relational models include specific syntax and semantics, while 
examples from the other models are less precise. An example database 
is described and a discussion o{ the structures for modeling this 
database focuses on data types, actual structui^res, integrity 
constraints, and schema definition. An examination 
manipulation covers access from external ?ii*>juages 
examples designed to show the power of the network 
models in manipulating their data structures. Some 
features of the two models are then analyzed, including their 
specific benefits and limitations. /. final chapter addresses 
additional issues in the selection of a database management system 
(DBMS), including access control, backup, recovery, bulk loading, and 
concurrency control. Most of the topics pertain to the daily 
operation of the DBMS itself rather than to the logical structures 
and operations of the data model. (LMM) 
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Guide on Data Models in the Selection and 
Use of Database Management Systems 

LeoTiard J. Gallvigher 
^lauB^ M. Draper 



Selectintj a database management system ' in- 
volves matching users'* requirements and the capa- 
bilities of available products. One way to sim- 
plify .this task is to define data models identify- 
ing both data structures and the operations on 
thoffe structures. In the past every commercial 
produdt has implemented its own data model. Now 
technical committee X3H2 of the Am^^rican National 
3tandards Institute is working on specifications 
for two models that ^re similar but not identical 
to many existing prodih±|^. The network model is a 
structure-oriented model that is espebially suit- 
able for databases with static structures and a 
hi^h volume of^ record-at-aVtime processing. The 
relational model depends more heavily on opera- 
tions than structures 'and thus provides the flexi- 
bility to handle dynamic databases. Examples 
written in the draft Network Database Language knd 
the Relational Database Language demonstrate that 
both, models can answer complex queries in a 
straightfbrwaid manner. 

In additipn to the , issue of data models, 
prospective buyers of database software need to 
consider features that affect daily operations. 
Existing hardware and operating systems sometimes 
limit the choice to a few commercial products. 
Systems also vary widely in their facilities for 
backup and recovery, bulk loading, schema manipu- 
lation, concurrency control, and report writers. 

Key words: computer languages; computer software 
standards; DBMS; data management; data models; 
database management systems; network databases; 
relational j^atabases; system selection. 
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1. * INTRODUCTION 



Since the ,early 1960''s, when the formatted file systems 
'/that preceded modern generalized database management systems 
(DBMS's) were first introduced r application developers have 
been able to select from an increasing number of research 
.umd commercial DBMS products. Recent articles in the trade 
press list well ovor one hundred different vendors marketing 
nearly two hundred separate database management products 
[PERS81, SOFT82]. New products are announced continuously^ 
especially for small and medium-sized computers. 

Today — 'Without any internationalr nationals or 
Federal standards — virtually every commercial database 
product is unique. Furthermore^ extant database management 
systems are described primarily by reference documents that 
are sometimes incomplete: the products themselves provide 
the ultimate specifications. It is difficult, therefore, to 
characterize classes of these nonhomogeneous and undocument- 
ed database management systems. Potential users need a 
method for differentiating these products according to their 
fundamental capabilities without becoming overwhelmed by 
highly specialized features of specific implementations. 
Hhe concept of a data model provides such a means for clas- 
sifying and understanding implementations of d.atabase 
management systems. Fortunately, many DBMS products are 
based upon one of a small number of data models that ^ave 
received extensive attention in the research literature. 
Two of these models — the network and the relational.. — are 
cut rent candidates for American national standards. 

This report identifies the characteristics of seve4?al 
major data models,., with special emphasis on the two proposed 
standards. No one^data model is uniquely appropriate for 
all -database applications; the special characteristics of 
each application will determine the most appropriate data 
model. 



1.1 Data Models and Database Management Systems 

A data model is a collectiion of data structures togeth- 
er with a collection of operations that manipulate the data 
structures. for the purpose of storing, querying, or process- 
ing the structure contents. A data model may also include 
the integrity constraints defined over the data structures, 
or it may include access control facilities or, mechanisms 
for defining various external user views of the database. * 
Some data models provide physical storage structures and 
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physical access methods as part of the data model, but usu- 
ally a data model is limited to the. data structures and 
operations that are available to an end user and may be ac- 
cessed from an application program. . . ' 

A database management system is a general purpose » ap- 
plication independent, software package used in association 
with computer hardware to facilitate the entry, storage, 
processing, sharing, and retrieval of data from a database. 
The portion, of a DBMS that deals directly with the process- 
ing of the data structures of a data model is sometimes re- 
ferred to as the databasf. control system. A DBMS supports a 
data model and is an implementation of that data model. 
Some database management systems may support multiple data 
models by providing different user interfaces to the data- 
base. A DBMS provides for transformation of the logical 
data structures of a data model to the physical storage 
structures of a. particular hardware environment. The DBMS 
goes beyond the data model in that it must provide for com- 
munication with the operating system of the host computer, 
as well as interface with programming languages or associat- 
ed software systems such as data dictionaries, report writ- 
ers, statistical packages, graphics, and libraries of speni^ 
cial data processing^ functions. A DBMS generally provides 
concurrency control, backup, and restart, a^ well as dumping 
and loading facilities for all databases under its control. 



1.2 Evolution of Data Models and Standards 

In the early days of ^ data 'processing, when external 
storage consisted of punched cards or paper or magnetic 
tape, all file access was sequential; no data coqld be i:e- 
trieved without first passing over all previously stored 
data. Even with this restriction, however, data processing 
thrived since many applications like payroll or invoicing 
required no mqre than this limited access mexhod. \ During 
this time a data model consisted of just a sequence of 
records, although many sophisticated specif icatiohs of mas- 
ter and trailer records were forerunners of m(j>re general 
structures in^ later data models. 

With the advent of drum and disk external storage came 
the notion of direct record access where each record carried 
a unique record identifier that could be used in indexes or 
in chains of related records. This led, in the mid IseCs, 
to early hierarchical data models that allowed direct access 
of master records and record-to-record navigation over 
subordinate trailer records. It was also during this period 
that significant strides were made in shared access to data. 
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mo longer was data owned by a single application; Instead » 
data files were stored In a "database" separatii from appli- 
cation programs* with flle^ access controlled by a "database 
management system." 

The late 1960's and early X970^s brought a flurry of 
database research. Charles K. B^chman Is widely recognized 
as one of the early developers of the network approach to 
data management? his 1964 paper with S.B. Williams presented 
a flexible scheme foro linking together records of different 
types using a pointer-chain structure (BACH64]. E.P. Codd 
wrote the 1970 seminal ?aper that defined the >;oncepts of 
normalization and joins for relational taKes [CODD70]. 
cThis paper created a good deal of interest iv^ various, high 
level query 'and manipulation languages based on a predicat,e 
calculus. By the late 1970^s both Quel (STON761 and .Sequel 
[ASTR75] had achieved popularity as very powerful yet user 
friendly data manipulation languages. The hierarchical ap- 
proach to database management derives from a generalization 
of the repeating group structures found in many programi^ing 
languages. Hierarchical *systems evolved independently in 
the 1960'*s so that currently there are many different ver- 
sions in the marketplace. Finally, The entity-relationship 
approach for describing a database became popular in 1976 
with the publication of an article by P.P. Chen (CHEN76]. 
The model proposed originally did not include specification 
of any data manipulation bperations; instead, it focused on 
the specification of entity types and the relationships 
among them. Later authors have specified operations to make 
it a complete data model [JOHN82, SHIP81] . Two conferences 
have, been devoted to the logical design and application of 
databases defined using this approach [CHBN79, CHEN81] . 

The first attempt at a standard specification for a 
specific data model occurred ducing 1967-71 when the CODASYL 
Data Base Task Group defined structures and operations for a 
network database facility in the COBOL language. The first 
ANSI activity leading to recognition of different data 
models occurred in 1977 with completion of the ANSI/X3/SPARC 
Data Base Study Group report that presents a three-schema 
architecture for database management [SPAR77]. An ANSI 
technical committee, X3H2, was established in 1978- to define 
a data definition language for interface from various data 
manipulation languages defined by the programming language 
committees. In 1981, the scope of X3H2 expanded to include 
definition of ANSI standard structures and operations for 
the network data model. Finally, in 1982, X3H2 was asked to 
develop similar definitions for the relational data model. 
The results of these tasks are the Network Database Language 
(NDL) and the Relational Database Language (RDL) , ^^^'o draft 
ANSI specifications that are discussed in tni,s report. 
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These specifications are now lalrly stable, but they nay 
chanqe before final adoption. 

As mentioned earlier, there are no existing database 
standards at either the international , national, or Federal 
levels. However, the ANSI proposals^ are under critical re- 
view by a special international database experts group that 
will make recommendations to its parent International Stan- 
dards Organization committee. Federal representatives have 
been active participants in all ANSI database committees. 
It appears likely, at least in the near term, that interna- 
tional and Federal standards, will derive jErom_ anlfl be con- 
sistent with resulting ANSI standards. 

An important feature of the planned database standards 
is that no single interface specification will exclude other 
interfaces between the end user and the database. For exam- 
ple, both the NDL and RDL assume a programming language in- 
terface to the data, yet they acknowledge the existence of 
other user interfaces such as: ad hoc query and report writ- 
er languages, schema manipulation languages or data diction- 
ary interfaces, special transaction processing systems that 
take advantage of modern screen and graphics capabilities, 
and bulk loading or unloading facilities both for database 
backup and for database information interchange. Language 
specifications for chese additional capabilities are, at 
present, unique to each DBMS vendor. If and when standard 
, specifications become available, they should be upwardly 
compatible with established data model standards. 



1.3 Purpose and Organization of this Report 

The purpose ot^his report is to provide a tutorial in- 
troduction to d^a models in gen^jpal, with particular em- 
phasis on the r^ational and network models defined by the 
two proposed^ ANSI database language standards. Even though 
no current commercial product satisfies either specification 
exactly, the specified structures and operations are typical 
of existing capabilities in a wiie variety ot DBMS products. 
Thus the proposed languages can be used for comparison pur- 
poses in DBMS selections made even before the existence of 
conforming products. 

The next three chapters of this report con^stitute the 
tutorial introduction to data models. Examples based on the 
network and relational models [X3H283a, X3H283b] include 
specific syntax and semantics, whereas examples from the 
other models are necessarily less precise. Chapter 2 
describes an. example database and then focuses on database 




structures and their definitions in several models. Data 
manipulation is ths topic of Chapter 3, with a discussion of 
access from external languages and a number of examples 
designed to show - the power of the network and relational 
models in manipulating their data structures* Because exam- 
ples alone cannot demonstrate all the features of a 
language # Chapter 4 discusses some characteristic features 
of these two data models both their particular benefits 
and their limitations* 

While choosing the right data model for a database ii 
probably the most important aspect of DBMS selection r the 
data model per se does not specify all the essential 
features of a product. Chapter S discusses many of the oth- 
er issues in the selection of a DBMS: access control » back- 
up r recovery r bulk loading r and concurrency control. This 
chapter is not intended to be used as a specification! in- 
stead# it is included to aid in recognition of critical is- 
sues in the selection process. Most of the topics mentioned 
in . Chapter 9 perti|in;;to the daily operation of the DBMS it- 
self rather than to the logical structures and operations of 
the data model. 



2. DATA STRUCTURES AND DEFINITION 



For the purposes of this tutorial the examples refer to 
a database of information about employees and departments. 
Each department has a unique name* a specific location, and 
a set of employees. Employee records include name, age» 
manager, and salary r plus a history of the employee's posi- 
tions in the organization. Users of the database must be 
able to retrieve employee information either directly or by 
department. In describing the structures for modeling this 
database* the following paragraphs focus on data types, the 
actual structures* integrity constraints, and schema defini- 
tion. 



2.1 Data Types 



Every data model has a particular collection of data 
types. A data type is the definition of a set of values 
that can be represented in a data model. A value is primi- 
tive: it has no logical subdivision within the data model. 
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Such primitive values are the basis of definition for the 
other data structures of the model. Most models have a 
character string data type and at least one numeric data 
type. Some make a distinction between fixed- length and 
variable-^length strings 9 or between exact and approximate 
numeric valuea. Some numeric types may be defined with dif- 
ferent degrees of numerid precision. Other common data 
'types include calendar date, time-of-day, zip code, sex 
code, Boolean, money, complex numbers, long strings of text, 
enumeration types, or various forms of pointers and identif- 
iers. 

The ANSI database committee has defined" three common 
data types for network and relational databases. They in- 
clude character strings,- exact or fixed point numbers, and 
approximate or floating . point numbers. In addition, the 
proposed NDL and RDL specifications accommodate correspond- 
ing data types in varidus ANSI standard programming 
languages ( e.g., FORTRAN, COBOL, PL/1, and Pascal). In- 
stead of trying to describe the specification of each 
language -specific data type in the database interface, we 
will focus on the common types in our examples, referring to 
them as CHARACTER, FIXE J, and FLOAT, respectively. 

A character string is a finite sequence of characters, 
taken from some well-defined character set (e.g., ASCII, 
EBCDIC, BCD). Character sets for databases may include: the 
human readable "graphic characters" as specified by ANSI 
X3.4 [ANSI77], the complete set of 128 characters as speci- 
fied by ANSI X3.4, various international character sets as 
specified by ISO jS1:andards [IS073, IS082] or vendor specific 
character sets. Each character string has a fixed length, a 
positive integer associated with the' string that describes 
the number of characters in that string. Strings may be of 
variable length up to the fixed length, but logically they 
are padded with blank characters when used in comparisons. 

Numbers are values that have normal mathema^^ical pro- 
perties; they are defined as real numbers with decimal base. 
Fixed point numbers are assumed to be exact values, with an 
associated precision and scale factor. The. precision speci- 
fies the number of significant decimal digits, and the scale 
factor specifies the placement of the decimal point. Float- 
ing point numbers, which are assumed to be approximate 
values, consist of^a significand and an exrad. The signifi- 
cand is a fixed point number, and the exrad is an integer. 
The value of the number is the value of its significand mul- 
tiplied by 10 to the power of the exrad. Every floating 
point number has a precision that specifies the precision of 
the significand. 



2.2 Structures 



While data types should be consistent from one data 
model to another r data structures will by definition vary. 
What distinguishes a data model is not the lowest level 
values » but the organisation of those values into structures 
and the provision of appropriate operations on those struc- 
tures. The following sections show what a variety of data 
structures can be built up from the elementary data types 
just described. 



2.2.1 The Network Model . The network data model contains 
two basic data structures: records and sets. As the basic 
units of data manipulation^ records are stored^ erased, 
found y modif iedy and connected and disconnected from other 
records. Seta, the basic units of navigation, maintain 
inter-record relationships. Using logical scit access paths 
defined by the database schema » a user can move from one 
record to another • 

A record is a collection of data components » each of 
which is either a data item or an array. A data item con- 
sists of a single value; an array is a multi-dimensional 
table of values that is represented by a sequence of data 
items. ^Each such array has a fixed dimension that is a po- 
sitive ii.teger. Every positive integer less than ot equal 
to the dimension determines a direction for that arrays and 
each direction has an extent that Is also a positive in- 
teger. For example^ in a two-dimensional array there are 
two directions. The extent in the first direction deter- 
mines the number of rows of the tables and the extent in the 
second direction determines the number of columns. The 
number of data items that occur in an array is the product 
of the extent integers of that array. A data item within an 
array is referenced by a multi-dimensional subscript. An 
implicit row-major ordering of array items establishes a 
unique correspondence between a data item referenced by a 
subscript ^nd its sequential position in the array represen- 
tation. For example^ in a two-dimensional array with three 
rows and four columns , items 1-4 occupy the first row» items 
5-8 the second row» and items 9-12 the third row. The sub- 
script (2,3) thus references the seventh sequential posi- 
tion. 

All records in a network database are partitioned ac- 
cording to record type. A record type defines the com- 
ponents of each record occurrence of the record type and de- 
clares a record name for the record type and a component 
name for each component. Each record of the database is an 
occurrence of exactly one record type and consists of 



exactly the data items defined by that record type. 

A set is a structured collection of related records. 
It models the classical data processing notion of master and 
trailer records. Each set is an occurrence of a set type, 
which is the definition of a collection of sets all having 
the same^characteristics. The declaration of a set type 
specifies the name of the set type and the owner and member 
record types that are associated with the set type. A set 
establishes a relationship among itsxcomponent records that 
must be maintained by the DBMS. On^'i^Wdrd from each set is 
designated as the owner record of that set. Any other 
record in the set is a member record. Given a set type, 
there is exactly one set for each record of the owner record 
type. That is, in practice an occuri^etice / of the owner 
record determines an occurrence of the set. Each set may 
contain zero or more occurrences of each memSei: record type, 
but each member record occurrence may belong to at most one 
set. In practice, this restriction allows navigation from a 
member record to Its unique owner record./ The member 
records of each set of a set type ai?e • mair/tained in a 
sequential order determined by the ordering criteria of that 
set type. / 

The network model supports two special set types: 
singular and recursive. A singular set type has SYSTEM de- 
clared as its owner .record type. SYSTEM can be thought of 
as a special record type containing exactly one record y^ith 
no data items. Hence, there is only one /Occurrence of a 
singular set type, and it allows direct access to member 
records without first navigating through multiple oc- 
currences of an owner record type. A recursive set type has 
the same record type declared both as the J owner record type 
and as a member record type; thus, ii allows convenient 
representation of hierarchical relationships, such as 
manager /employee, among records of the sa|ne record type. 

All records must be distinguishable /by the DBMS, in- 
cluding those that participate in the same data structures 
and have the same values for each component data item. For 
this reason, each record is associated /with a unique record 
identifier called a database key, which is a conceptual, - 
implementation-dependent object used to maintain position in 
the database. Database keys are not directly available to 
an end user. The action initiated by /any database statement 
is dependent upon the <!^atabase keys that occur as values of 
special cursors maintained by the DBMS in a session state 
for each database session. / 
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One convenient network structure £or the sample data- 
base is illustrated in Figure 1. This view of the database 
is defined by a schema named COMPANY. Each rectangle 
represents a record type with the type name in the top half 
and the component names in the lower half. Set types appear 
as labeled arrows driawn from the owner record type to the 
member record types. The oval labeled SYSTEM represents the 
owner of a "singular set" that, according to a schema order 
clause, provides an alphabetical ordering of member records 
by employee name. 

In this example the DEFT record type contains com- 
ponents named NAME and LOG to contain the department's name 
and location. Each department's employees form an oc- 
currence of the set type PAYROLL, which, through an order 
clause in the schema, orders the employees of each set ac- 
cording to decreasing salary. Some of the information about 
an employee — name, age, and salary — appears explicitly 
in the EMP record identified by the employee's name. Other 
information, like the employee's department and manager, is 
actually contained in the s.tructure of the database rather 
than in a component within a given record. For some appli- 
cations it might make more sense to include in each employee 
record one component containing the department and another 
naming the manager. That, too,, would be a legitimate net- 
work structure, but it would not illustrate as many features 
of the network data model. The^ recursive set type MANAGES 
determines a one-to-many relationship from a manager to the 
employees directly supervised by that manager. Each employ- 
ee is the owner of a MANAGES set occurrence; however, if 
that person is not a manager, the set has ho member records. 
Note that one restriction inherent in the set structure is 
that no employee has more than one manager. Finally, the 
database contains in the record types SUPERVISORS and* STAFF 
the employment history of each employee. Notice that the 
only specified ordering of records of these types is through 
their membership in sets of the type JOBHISTORY. And since 
neither of these types includes a name component, their 
records are connected to an employee only by membership in a 
particular JOBHISTORY set, not by redundant data values. 



2.2,2 The Relational Model. In the relational model the 
primary data structure is the table, which is defined as an 
unordered collection of rows that are not necessarily dis- 
tinct. This assumption of nonuniqueness for rows contrad- 
icts some theoretical definitions of the relational mbdel, 
but it is consistent with most current implementations and 
with many, user requirements. Uniqueness is enforced, if 
necessary, by an integrity constraint. A row, which is the 
smallest unit of data that can be stored into or erased from 
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a table, is a non-empty sequence of values. The assumption 
that the values in a row are ordered is only important when 
values are referenced by position rather than by name. Each 
value belongs to a column, which is an unordered, named col- 
lection of values of the same elementary data type. The 
column is equivalent to the concept of a domain included in 
some theoretical definitions.' A column entry is the smal- 
lest unit of data that may be selected from a table or modi- 
,fied in a table.. Unlike the network model, the relational 
model does not support arrays of values. 

Each table is associated with a table definition that 
specifies the 'table name and table characteristics as well 
as the column name and column characteristics of each column 
of that table. Every row of the same table has the same 
cardinality and contains a value for every column of that 
table. The relational model supports the notion of a "null" 
value, which is comparable with but distinct from all other 
values. This special value is assumed for a column position 
in a row whenever no default or other non-null value is 
specifically assigned. 

Tables may be base tables, derived tables, or viewed 
tables. Base tables have persistent storage representations 
and persistent table descriptions, similar to the record 
types and set types defined by a network model schema. A 
derived table is a temporary table derived from one or more 
base tables during the execution of a^ database statement. 
As such, it does not become a permanent part of the data- 
base; it exists no longer than does the transaction in which 
it is defined. Viewed tables are derived tables that have 
persistent descriptions. Each viewed table provides 
subschema views of the database to external users. All 
three types of tables may be the objects of database state-^ 
ments in the relational data manipulation language. 

All rows in a table must be distinguishable by the 
DBMS, including those rows in the same table that have 
identical values for each column. For this reason, rela- 
tional implementations must rely on some kind of unique row 
identifier, be it physical location in a file or use of log- 
ical indicators or pointers, to distinguish rows. However, 
the relational model differs from the network model in that 
row identifiers are not used to represent inter-record rela- 
tionships; they are used only to maintain a cursor position 
within an individual table. , 

Figure 2 shows a possible relational structure for the 
sample database. There are four tables: DEFT, EMP, SUPER- 
VISORS, and STAFF. Whereas in the network database some re- 
lationships could be based on specifications of set types. 
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in the relational modei< they . depend \>n redundant data. 
Mote, for example the MANAGER Column in the EMP table, which 
takes values that must also appear in the NAME column of 
another row somewhere in the EMP table. In addition, all of 
the tables except DEFT include a component for the employ- 
ees" names, and employees and departments are associated by 
dynamic comparisons between the NAME column in DEPT and the 
DEFT column in EMP. The DEPT column in the EMP table thus 
partitions the employees by department in the same way that 
the PAYROLL sets do in the network example. The relational 
model does not maintain ordering of rows in base tables, so 
if the ordering of employees by decreasing salary in the 
network PAYROLL sets or the alphabetical ordering of employ- 
'ees in the network ALPHA set is important to an application, 
then ordering criteria would be specified by a cujrsor de- 
claration in an accessing module. 



2.2.3 Hierarchical Models. The basic structures of 
hierarchical models may be viewed as a subset of the network 
modiel data structures defined above. The main structure is 
a node (sometimes called segment or component) that is 
essentially equivalent to a network model record type. 
Nodes are connected one to another in a parent-child rela- 
tionship very much like the owner to member record relation- 
ship in a network model set type. The major restriction is 
- that no child node may have more than one parent node asso- 
ciated with it. In the network model this restriction would 
prohibit a record type from being a member of more than one 
set type. The hierarchical model thus places the same res- 
triction on set types that the network model places on set 
occurrences. This restriction simplifies data definition to 
the point that it is not necessary to define path names 
(i.e. network model set names) for the link between parent 
and child nodes; each such path is uniquely identified once 
the owner and child nodes are known. Another restriction is 
that hierarchical models often do not allow either direct 
access to a child node (as does a network model singular 
set) or circular node connections as in the network model 
recursive set. These restrictions on data organization lim- 
it the flexibility for defining highly integrated databases, 
but provide certain capabilities for operational efficiency 
and retrieval flexibility. 

Figure 3 shows a possible hierarchical structure for the 
sample database. The DEPT node is the "root" of ^e tree 
and has one child node (BMP) in addition to its. own data 
items. EMP in turn has two children, the nodes SUPERVISORS 
and STAFF. In this particular model the relationship among 
nodes' is conceptually tighter than in the network model. 
Each DEPT record includes a number ^f EMP records, each of 
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which \ncludeo either SUPERVISORS or STAFF records or both. 
Child node occurrences cannot exifet independently; of a 
parent node. The implementation of a hierarchy, however, 
could be , similar to that of a corresponding network data- 
ba8e% In either case relationships result from defined 
structures (perhaps pointers) rather than from redundant 
data values. 



2.2.4 Entity-Relationship Model. The entity structure of 
this model may be considered as equivalent to a relational 
table or a network record type. In most cases only elemen- 
tary data items are considered, so the array structure of 
the network model would not apply. As in the case of both 
record types and tables, one must assume the existence of 
unique entity identifiers to distinguish among entities that 
have identical data values. In most cases, such identifiers 
are not externally visible or accessible. The relationship 
structure of this model is a generalization of the network 
model set tyjpe. Instead of only one-to-many associations 
between one owner entity type and one or more member entity 
types, the relationships may be many-to-many among any 
number of participating entity types. The entity- 
relationship model is still in its research stages (CHEN81] , 
and commercial implementations have not yet established 
themselves in the marketplace. The* structures are used pri- 
marily for logical database design [CHEN82] . 



2.2.5 Other Structures. Many database management systems 
provide for the storage and manipulation of variable repeat- 
ing items. As an example, an employee record may contain a 
variable repeating item for the names of family members or 
for a listing of multiple telephone numbers. Early specifi- 
cations of the CODASYL network model included definition of 
this capability through nested repeating groups. Since CO- 
DASYL specifications derived from a COBOL programming 
language interface, repeating groups were define<3 by nested 
level numbers and OCCURS clauses just as in a COBOL record. 
A multi-dimensional table was defined using nested level 
numbers with an OCCURS clause at each level. Such repeating 
groups were deleted from the ANSI network model specifica- 
tions because each repeating group is representable in terms 
of arrays or in terms of additional record types and set 
types, "as an example, consider a nested record structure as 
follows: 
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RECORD NAME IS R 
v« 01" A PIC X(10) ' 

01 B OCCURS 5 TIMES 
02 C Pib X(10) 
. 02 D pCCURS 2 TIMES PIC,X(10) 

This structure could be represented as a 'single network 
model record type with A as an elementary item and C and 
D as arrays. The array C would be one-dimensional, oc- 
curring 5 times, and the array D would be two-dimensional, 
occurring . 5 by 2 times. Any references derived from the 
data name B would be lost to the data model, as would the 
association between occurrences of C and the f itst direc- 
tion of D. Explanations of such associations cquld be car- 
ried along as comments. ' 

A second representation of the above record structure 
could be through the 4<sfinition of additional-^ record types 
and set types. This approach is particularly suitable if 
data components B and D occur a variable number of times 
instead of a fixed number. For .example, A could be an 
elementary item in R, C an elementary item' in S, and D 
an elementary Item in T, where R, S, and, T« are diffeirent 
record types. The names of record types S and T would 
be derived from data components B and D respectively. 
Record type R would be the owner of a set type with S as a 
member, and S would be the owner of a set type with T as 
a member. 

The same situation could be described in the relational 
model using three tables logically associated by the ap- 
propriate primary and secondai^y keys (cf. Section 2.3.2) and 
referential integrity constraints. For example, the tables 
R (A,...), ^ (A,C,...), and T (A,C,D,...) would -convey 
essentially equivalent information with the following res- 
trictions: column A in table R and both column A, and column 
C in table S would have to be declared unique; column A of 
table S would have vfao .reference column A of table R; and 
columns A and C of table T^would have to reference columns A 
and C of table S. If the columns defined above were all of 
the record, then table T alone would suffice. We have added 
R and S to account for the cases where there are other com- 
ponents at both the 01 and the 02 levels. These components 
would constitute the other columns in tables R^ and S, 
respectively. 

* The CODASYL specification of the network model also in- 
cludes definitions for additional structures. An area is a 
collection of records together with a sequential . ordering 
over them. This structure exists in many commercial pro- 
ducts, usually as the logical grouping of record type 



populations stored together on the same physical file oi - 
storage device. Areas were deleted from the ANSI database 
definition because each area is logically equivalent to a 
singular set type with multiple member record types. The 
equivalence can be demonstrated by defining exactly one adr 
ditional singular set type to represent each area. The are^ 
name becbmes the set name of the new set type. Each record 
type having records in the area becomes a member record type 
of the singular set type, and the sequential order of 
records in the area determines the record order of member 
records in the singular set. 

A record order key is the declaration 4n a record type 
that record occurrences shall be ordered on given data items 
in a specif ied way. Record order keys were deleted from the 
ANSI network model specification because each such key is 
logically equivalent to a singular set type defined over 
that record type. The record order declaration becomes the 
order declaration for the member records in that set. 



2.3 Integrity Constraints 

Most data models provide some mechanisms for ensuring 
the integrity,, or validity, of values in the database. 
These mechanisms can be as simple as a specification of a 
domain that includes all valid values for a particular com- 
ponent or column, or as complex as an inter-record relation- 
ship that must be maintained by ever^ transaction. Integri- 
ty constraints depend on data structures and vary consider-^ 
ably from model to model. 

2.3.1 The Network Model. A network database definition may 
include specification ^of certain integrity constraints on 
the data items, records, and sets of the database. For ex- 
ample, the' length of strings and the precision of numbers 
are important constraints that should reside with the data 
itself. Additional integrity constraints may come in 
several forms. A check ccndition is an expression that must 
be satisfied by the values of a record when it is stored in 
the database or inserted as a member record in a set. A de- 
fault value is a value assumed by component occurrences in 
the absence of a specific value supplied by a user, and a 
unique constraint is a specification that no two records may 
occur in the database with identical values for specified 
components. Each set type description also includes specif- 
ic integrity declarations. A set ordering specifies whether 
the logical ordering of member records in a set is sorted, 
first, last, next, prior, or system default. If order is 
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sorted^ then a key declaration specifies the data items that 
determine the ordec-key. Set insertion declarations specify 
whether 'the insertion of a record as a member of a set is 
automatic^ structural r or manual. If insertion is structur- 
al r then values' fbr the structural data items determine how 
an owner record* is selected from the database. A set reten- 
° t'ion declaration specifies whether the retention of a member 
record in a set is fixed, mandatory, or optional. Examples 
of these integrity constraints occur in Chapter 4. 

" 2.3.2 The Relational Model. Somfe implementations or user 
installations of- ttre relational model include definitions 
for the primary key and secondary . key concepts. A primary 
key is declared for a single table; it consists of a column 
who^e values uniquely identify a row of the table. A secon- 

,da¥y key is also declared for a single table; it relates 
back to some existing primary key in a different table* The. 
secondary key .specifies a column of its table that assumes 
v^alues comparable to values of its associated primary key. 
The secondary key value of each row in a secondary key table 
identifies a unique row in the primary key table. In the 
absence of specific data , structures to represent inter- 
record relationships, primary and secondary keys are often 
used to maintain logical connections between tables. In the 
ANSI X3H2 relational specifications, the notions of primary 
and secondary keys are not explicitly defined. Instead, a 
prin|ary key may be assumed whenever a table unique con- 
straint ' is specified ov&r a single c6lumn, and a secondary 
key may be assCimed whenever a table referential, constraint 
is specified. A referential constraint requires that the 
"secondary key" of each row of the referencing table have a 
value that is identical to the "primary key" of some row of 
the referenced table. A referential constraint has some of 
the same features of the ' network model set type in that 
modify an(3 ^delete statements may cascade "from the primary 
key row to secondary key rows. 

A relational database definition may include specifica- 
^tion of additional integrity constraints on the rows and 
columns of tables in the database. As with the network 
model, this includes specification of length of strings and 
precision of numbers, as. well as declaration of default 
values for', columns and check conditions for rows and 
columns. Relational table constraints also include the 
unique and referential constraints defined in the preceding 
paragraphs Such integrity constraints are often considered 
to be part of the database structure. 



2.3.3 Other Models. The integrity constraints of a 
hierarchical database are specified differently by each im- 
plementation of a hierarchical data model. In most cases 
the integrity jonstraints are similar to possible con- 
straints in the proposed network and relational, standards.- 
Other modelsr including the entity-relationship model, can 
provide similar integrity constraints. In each case there 
can be atomic constraints on particular values or more ela- 
borate mechanisms (such as triggers) that automatically 
alter data items which depend on values changed by a user. • 



2.4 Schema Definition 

Every data model must provide some means of defining 
the structures of a particular database. Some models com- 
bine data definition and data manipulation in a single 
language^ but most separate them in one way or another. In 
some cases data definition is the province of the database 
administrator alone. 

Becaus'e the proposed ANSI database languages accommo- 
date language-specific data types, each schema or table de- 
finition has a language environment clause identifying an 
ANSI standard language. To avoid focusing on a single 
language, we have chosen to use in our examples a pseudo- 
language with the types CHARACTER, FIXED, and FLOAT. In an 
actual implementation PSEUDOLANGUAGE would have to be a 
standard language listed in the ANSI database specification, 
and the type declarations would match those of the chosen 
language. Programs written in other languages could access 
the database, but either the programs themselves or the 
language compilers would be responsible for conversions 
between database data types and the data types of the ac- 
cessing language. 



2.4.3 The Network Model. An example of the proposed schema 
defTriition language for describing a network database is 
given in Figure 4. The schema name is COMPANY. The NAME 
items and the LOG item have character string values; AGE 
values are integers having three significant digits, and SAL 
values are decimal fractions having eight significant digits 
with two decimal positions. The ORDER SORTED statements 
specify that member records are to be ordered by data item 
values rather than by some chronological order of insertion 
or application dependent positioning; the KEY clause in each 
member record description specifies the member data items 
that determine this order. In the PAYROLL set, the DUPLI- 
CATES phrase specifies that member records having identical 
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salaries ace allowed as duplicates, and the relative order- 
ing of duplicates is determined by system default. In the 
ALPHA set, the DUPLICATES phrase specifiiis that two member 
employees are not allowed to have the same name. The INSER- 
TION and RETENTION clauses specify insertion and retention 
integrity constraints for member records of the set type. 
The differences among automatic, manual r or structural 
insertion and fixed, mandatory, or optional retention are 
discussed further in Section 4.1.6. 



Defining the Example Database 
In the Network Model 



SCHEMA COMPANY 
ENVIRONMENT PSEUDOLANGUAGE 
RECORD DEFT 

NAME CHARACTER 3 



LOC CHARACTER 2 
RECORD EMP 

NAME .CHARACTER 15 
AGE FIXED 3 0 
SAL FIXED 8 2 ^ 
RECORD SUPERVISORS 

POSITION CHARACTER 20 
STAFFS I ZE FIXED 3 0 
STARTDATE , FIXED 6 0 
ENDDATE FIXED 6 0 

RECORD STAFF 

POSITION CHARACTER 20 

MANAGER CHARACTER 15 

STARTDATE FIXED 6 0 
ENDDATE FIXED 6 0 

SET PAYROLL 

OWNER DEPT 
ORDER SORTED 

DUPLICATES DEFAULT 
MEMBER EMP 

INSERTION MANUAL RETENTION MANDATORY 
KEY DESCENDING SAL 
SET ALPHA 

OWNER SYSTEM 
ORDER SORTED 

DUPLICATES PROHIBITED 
MEMBER EMP 

INSERTION AUTOMATIC RETENTION FIXED 
KEY ASCENDING NAME 



SET MANAGES 

OWNER EHP • 
ORDER SORTED 

DUPLICATES PROHIBITED 
MEMBER EMP 

INSERTION MANUAL RETENTION OPTIONAL 

KEY ASCENDING NAME 
SET JOBHISTORY 
OWNER EMP 
ORDER SORTED 

DUPLICATES PROHIBITED 
MEMBER SUPERVISORS 

INSERTION AUTOMATIC RETENTION FIXED 

KEY DESCENDING STARTDATE 
MEMBER STAFF 

INSERTION AUTOMATIC RETENTION FIXED 

KEY DESCENDING STARTDATE 

Figure 4 



2.4.2 /The Relational Model. Unlike the network model, the 
relational model does not contain specific schema and 
subschema definition languages. Its major data definition 
stateiient is C^ATE TABLE, which includes in its syntax the 
statement form for defining columns and constraints. Figure 
5 shbws the syntax for "defining a particular version of the 
samplfe database. This syntax defines tables without loading 
any Idata. For each table there is a CREATE TABLE statement 
that names the table and names and defines its columns. In 
the definition of EMP, for example, NAME is a column whose 
values are character strings of length 15, AGE can assume 
integer values from 0 to 999, SAL takes on fixed decimal 
value's from 0.00 to 999999.99, DEPT takes character strings 
of length 3, and MANAGER in like NAME. In addition, values 
in tile NAME column of a given row must be unique and non- 
null. 1 Finally, values in the DEPT column are tied by an in- 
tegri'ty constraint to values in the NAME column of the DEPT 
table. 



30 

•22- 



Defining the Ekanple Database 
In the Rela^onal Model 



\ 

CREATE TABLE DEPT \ 
ENVIRONMENT PSEUDOLANGUAGE 
NAME CHARACTER 3 NOT NULL 
LOC CHARACTER. 2 NOT NUi;.L 

1 

CREATE TABLE EMP 

ENVIRONMENT PSEUDOLANGUAGE ! 



UNIQUE 
DEFAULT "NY" 



NAME CHARACTER 15 NOT .NULL UNIQUE 
AGE FIXED 3 0 i 

SAL FIXED 8 2 

DEPT CHARACTER 3 REFERENCES DEPT. NAME 

CASCADE MODIFY RESTRICT ERASE 

MANAGER CHARACTER 15 



CREATE TABLE SUPERVISORS 
ENVIRONMENT PSEUDOLANGUAGE 

NAME . CHARACTER 15 NOT NULL REFERENCES EMP. NAME 
^ -CASCADE-MODIFY CASCADE ERASE 



POSITION 
STAFFS I ZE 
STARTDATE 
ENDDATE 



CHARACTER 20 NOT MULL 
FIXED 3 0 

FIXED 6 0 NOT NULL 
FIXED 6 0 



CREATE TABLE STAFF 
ENVIRONMENT PSEUDOLANGUAGE 



NAME 

POSITION 
MANAGER 
STARTDATE 
ENDDATE 



CHARACTER 15 NOT NULL REFERENCES EMP. NAME 

CASCADE MODIFY CASCADE ERASE 

CHARACTER 20 NOT NULL 

CHARACTER 15 
FIXED 6 0 NOT NULL 
FIXED 6 0 



Figure 5 



2.4.3 Other Models. Schema definition for other data models 
Is usually similar in form to the examples given above for 
the network and relational models. Since there is currently 
no candidate standard for other mocels, the exact syntax 
will vary from one product to the next. Vendors of 
hierarchical systems must provide a way of defining the 
structure of various nodes and their relationships to each 
other. Similarly, the entity-relationship model must enable 
users to define entities, attributes, and relationships. 
Whatever the structure of the data model, the DBMS will have 
to have not only constructs that are similar to records and 



-23- 31 




components, but also a means of definn^g these structures 
and any relationships between them. 



In addition t6 data structures, a data model specifies 
operations that insert^ delete^ or modify data in a data- 
base. The specification includes both the primitive opera- 
tions .of the model and a mechanism for calling those opera- 
tions from an external source. Whether that source is a 
batch program^ an interactive query language invoked from a 
terminal, or an entirely separate application system is Ir- 
relevant to the model, although it may be very important to 
a potential user trying to discriminate between competing 
commercial DBMS'^s. The following sections discuss some of 
the possibilities for data manipulation and give explicit 
examples of operations on the network and relational data- 
bases defined in Chapter 2. , 



3.1 Modules and Procedures 

As defined by ANSC X3H2, a module is a persistent ob- 
ject specified by either the Network Database Language (NDL) 
or the Relational Database Language (RDL) procedure 
language. It consists of a language environment clause ^nd 
one or more procedure definitions, together with cursor de- 
clarations if it is a relational module, and subschema and 
keeplist declarations if it is a network module. The 
language environment clause specifies the name of a standard 
language (e.g., COBOL) from which procedures in the module 
will be called. A procedure definition consists of the pro- 
cedure^ name, a sequence of parameter declarations, and a se- 
^ quence of statements for querying or modifying the database. 
The declaration of a parameter specifies its data type and, 
in a network module, specifies whether it is an elementary 
value or an array of values. A parameter either assumes or 
supplies values to a corresponding argument in a call of the 
procedure. 

Every module is associated with an application program. 
A procedure in the module is referenced by an external 
"call" from the application program. The call specifies the 
procedure name and supplies a sequence of parameter values 
corresponding in number and in data type to the parameter 
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declarations of that procedure. Each call of a procedure 
causes its sequence of statements to be executed. The 
parameters return values from the database to variables in 
the application program referenced by arguments in the call 
of the procedure. ' - ' 

Database modules and named procedures provide external 
programs direct access to NDL and RDL statements with no re- 
quired additions or modifications to the syntax of the ac- 
cessing languabfe. All that is needed is a correspondence 
between data types and the ability to call separately com- 
piled procedures written in a different language. Program- 
ming languages may use these facilities directly for inter- 
face to the DBMS, or they may call procedures, implicitly ei- 
ther by preprocessing embedded database statements or by de- 
fining native syntax for ipvoking DBMS functions. 



3.2 Database Access fxom External Languages 

As noted above, programmii g languages or application 
systems accessing a database need the following: 



* A correspondence between data types. 

* A method for calling DBMS procedures. 

A standard specification for these items is analogous to the 
problem of a standard specification for cross-language calls 
between any two programming lang'iages. A type correspon- 
dence between DBMS data types and programming language data 
types specifies the language data types that can be passed 
validly as parameters to a DBMS procedure. This type 
correspondence for ANSI COBOL, FORTRAN, PL/1, and Pascal is 
specified by the NDL and RDL proposals.' Type correnpondence 
in general can range from the very strict, with a one-to-one 
mapping of data types, to the very flexible, involving sig- 
nificant run-time conversions. The examples in Section 3.3 
contain a type declaration on each side of the interface, 
with an implicit assumption that the underlying type 
correspondence is well-specified. 

The DBMS procedure calling requirement can be satisfied 
in several ways, including: 
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* Explicit procedures written and called directly by 
the user. 

Implicit procedures embedded in a oCalling program and 
extracted by a preprocessor. 

Native syntax defined as an extension to a program- 
ming language. 

The following sections discuss each of these alternatives, 
and Section 3.3 provides simple examples. 



3.2.1 Explicit Procedure Approach. , The "explicit procedure" 
approach Ti available to any language having a. subroutine 
facility capable of calling separately compiled procedures 
written in a different language. In this case the end user 
writes both an application program ar^d a separate DBMS 
module for each database application. A standard program- 
ming language bompiler compiles the application program, and 
the DBMS itself .compiles the DBMS module. The module con- 
tains named database procedures, each consisting of a se- 
quence of DBMS functions written in the database procedure 

language syntax. Using its standard syn tax f or invoking 

subroutines , the appl Icat ion prW^^an* caTrs~a~'database " pro 
cedure by name. Variables defined in the application pro 
gram and passed to a DBMS procedure are declared as parame- 
ters in that procedure. A method for linking the compiled 
language program with the compiled DBMS module is 
implemen tor-def ined . 



3.2.2 Implicit Procedure Approach. 

The "implicit procedure" approach assumes specification of 
some variation of a preprocessor. References to DBMS func- 
tions are embedded directly in the application program using 
some convention to distinguish DBMS statements from standard 
programming language statements. In this case, the program- 
mer writes a single program containing a mixture of program- 
ming language statements and database language statements. 
The program is processed by a precompiler to produce a 
"pure" external program capable of compilation by a standard 
programming language compiler, and a database module, as 
above, capable of compilation by the DBMS. A method of gen- 
erating names for the DBMS procedures and calling them by 
that name from the "pure" program is Included tn the specif- 
ication of the preprocessor; the original source applica- 
tion, written by the end user, need not provide procedure 
names . 
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3.2.3 Native Syntax Approach. The "native syntax" approach 
involves adding specific database syntax to existing pro- 
gramming languages. This syntax could be any of the follow- 
ing : 

>' 

*\ Native syntax for each DBMS function. 

\ 

* New syntax for combinations of DBMS functions. 

* syntactic variations for triggering DBMS calls. 

In any ca^e, the programmer writes a single program contain- 
ing integ^a'ted syntax for programming language statements 
and database functions. The complete program is compiled by 
a standard programming language compiler to produce object 
code for a database session. If a language defines new com- 
bination functions, then each such function must be defin- 
able by a sequence of DBMS functions. A programming 
language developer Would have the option of defining syntac- 
' tic variations for calling DBMS functions. For example, an 
NDL_ test function could be triggered by a native language 
Boolean expression with the test result used in program con- 
trol statements. A programming language could also require 
a one-to-one correspondence between selected programming 
language variables and database data items. Such a 
correspondence would minimize the native language syntax 
needed for parameter passing between the DBMS and the ac- 
cessing language. If desired, the programming language 
could include a facility for handling automatically all ex- 
ception conditions so that the application programmer would 
not have to check manually the database status after each 
call to a DBMS function. 



3.3 Examples 

The rest of this chapter consists of a number of exam- 
ples designed to illustrate some of the more common opera- 
tions of the network and relational models. Other database 
models will probably be analogous in some ways to one of 
these two models. Navigational systems will have operations 
that resemble those of the network model, and systems that 
rely on retrieval through indexes will probably provide a 
selection language somewhat similar to the relational cal- 
culus. Whatever the model, the interface it provides to 
host programming languages may fall into one of the 
categories discussed above. 
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3.3.1 Explicit Prooedure Calls. Each explicit procedure ex- 
ample consists of a pure programming language main projgram 
with calls to DBMS procedures contained in a separate DBMS 
module. The main program is written entirely in the stan- 
dard syntax of a programming language; all variables are de- 
fined with programming language data types, and all subrou- 
tine parameters are declared and passed to the external pro- 
cedures in the manner standard to that programming language. 
The database procedures are written in the proposed database 
language and are* combined together into a single database 
module that can be processed separately by the DBMS. Each 
database procedure has a name that is referenced in the call 
from the mai^ program. Procedures declare data types for 
each parameter passed. Each procedure consists of a se- 
quence of DBMS statements that use. the declared parameters 
to invoke DBMS functions. 

Figure 6 is an example application for loading a net- 
work database defined by the COMPANY schema through the fol- 
lowing PERSONNEL subschema: 

SUBSCHEMA PERSONNEL IN COMPANY 
RECORD DEPT ALL 
RECORD EMP ALL 

SET PAYROLL « ^ 

SET ALPHA 



The program initiates a' database session, inputs values for 
DEPT and EMP data items from an external device, and then 
creates and stores new department and employee records in 
the database. The main program consists of four calls to 
database procedures. In line 8, READYDEPTEMP calls -DBMS 
functions that ready the DEPT and EMP record types for pro- 
tected update. In lines 11 and 14, STOREDEPT and STOREEMP 
pass parameters that carry values for data items of depart- 
ment and employee records. The corresponding database pro- 
cedures then « invoke DBMS functions to store the records in 
the database and to connect the records td any sets that re- 
quire manual connection. In line 17, PINISHLOAD calls DBMS 
functions that commit the modifications of the session to 
the database and finish access to the previously opened 
record types. 

For the relational example we assume that the relation- 
al database portrayed in Figure 2 has been loaded in a 
manner similar to that used to load the network database. 
Figure 7 is then a relational application that uses explicit 
procedures to select company employees whose current annual 
salary in thousands is less than twice their current age, to 
print the name of each such employee, and then to modify the 
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Samplie Program to Load a Network Database 
Using Explicitly Declared Procedures 



1. Declare 

2. DEPTNAME PIC X(3) 

3. DEPTLOC PIC X(2) 

4. EMPNAME PIC- X (15) 

5. ' . feMPAGE PIC 999 

6. BMPSAL. PIC 9.(6)V99 ; 

7. Begin Main Program 

8. Call READYDBRTEMP ; 

9. While (Not EOF )Do 

10. Accept DEPTNAME, DEPTLOC ; 

11. 1. Call STOREDEPT Using DEPTNAME DEPTLOC ; 

12. While (Not EOF ) Do 

13. - Accept EMPNAME, EMPA6E, EMPSAL ; 

14. ' Call STOREEMP Using EMPNAME EMPAGE EMPSAL ; 

15. Endwhile 
.16. Endwhile; 

17. Call FINISHLOAD 

18. End Main Program. 



1. 
2. 
3. 
4. 
5. 



8. 



11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 
21. 
22, 
23. 
24. 



MODULE 

ENVIRONMENT PSEUDO-LANGUAGE 
SUBSCHEMA PERSONNEL IN COMPANY 
PROCEDURE READYDEPTEMP 
READY DEPT SHARED UPDATE 
READY EMP SHARED UPDATE 
PROCEDURE STOREDEPT 
N CHARACTER 3 
L CHARACTER 2 
STORE DEPT 

SET NAME TO N 
SET LOC TO L ^' 
PROCEDURE STOREEMP 

N CHARACTER 15 
A FIXED 3 0 
S FIXED 8 2 
STORE EMP 

SET NAME TO N 
SET AGE TO A 
SET SAL TO S 
CONNECT EMP TO PAYROLL 
PROCEDURE FINISHLOAD 
COMMIT 
FINISH ALL 




Sample Program to Modify a Relational Database 
.Using Explicitly Declated Procedures 



1. Declare EMPNAME PIC X(15) 

2. EMPA6E PIC 999 

3. EMPSAL PIC 9(6)V99 

4. RDLCODE PIC 9(5) ; 

5. Begin Main Program 

6. Call EMPOPEN Using RDLCODE ; 

7. If (RDLCODE <> "00000") Then 

8. ' Begin Print •*DB-ER'ROR" ; STOP End ; 

9. Call EMPFETCH 

10. > Using EMPNAME EMPAGE EMPSAL RDLCODE 

11. While (RDLCODE » "OOOOO") Do 

12. Print EMPNAME ; 

13. Call EMPMODIFY Using EMPSAL RDLCODE ; 

14. Call EMPFETCH 

15. Using EMPNAME EMPAGE EMPSAL RDLCODE 

16. Endwhile; 

17. If (RDLCODE ■ "^OOIOO") Then 

18. Begin Call EHPROLLBACK ; 

19. Print "DB-ERROR"; STOP End ; 

20. Call EMPCOMMIT ; 

21. End M^in Program. 

1 . MODULE 

2. ENVIRONMENT PSEUDOLANGUAGE 

3. PROCEDURE EMPOPEN 

4. STATUS 

5. OPEN RAISEPAY CURSOR FOR 



10. PROCEDURE EMPFETCH 

11. N CHARACTER (15) 

12. A FIXED (3,10) 

13. S FIXED (8,2) 

14. STATUS 

15. FETCH RAISEPAY INTO N,A,S 

16. PROCEDURE EMPMODIFY 

17. S FIXED (8,2) 

18 . STATUS 

19. MODIFY EMP 

20. SET SAL ■ S + 5000 

21. WHERE EMP IS CURRENT OF RAISEPAY 

22. PROCEDURE EMPCOMMIT 

23. COMMIT 

24. PROCEDURE EMPROLLBACK 

25. ROLLBACK 

Figure 7 
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database t\> give each ' of the selected employees a $5,000 
salary increase. ^The top part is the calling program, and 
the bottom part is the RDL module. 

The chitef advan-tage of the explicit procedure approach 
is that no Change ^need be ,made to existing specifications of 
the programnlihg language; the main program of each example 
could be compiled by an existing programming language com- 
piler with ho modifications to either the program or the 
compiler. / The chief disadvantage is that the programmer 
must write (two programs to achieve one result and must 
create somewhat' superfluous names to link calls in the main 
program to proceduires in the database module. A more self- 
contained approach, requiring prep;:ocessing of the applica- 
tion program, is given in the nekt section. 



3.4 Implicit Procedure Calls J ' ^ 

This aipproachl for database access through an external 
language i,s£an extension of the preceding explicit procedvire 
approach. ' instead of writing a database module and the ap- 
plication program separately, the programmer writes a com- 
bined application with database statements intermixed with 
programming language statements. A preprocessor scans the 
user^s program "to produce a "pare" external program capable 
of compilation by a standard programming language qompiler 
and a separate database module, as described above, for in- 
terface to the DBMS. Such preprocessing may take place as a 
separate step, or it may be combined with the language com- 
piler. At the present time, proposed database standards ad- 
dress only explicit procedures; a standard syntax for impli-- 
cit procedures could be the subject of a subsequent database 
standard, or it could be defined by each separate program- 
ming language. 

The examples in Figures 8 and 9 are applications of 
this approach to achieve the same employee salary modifica- 
tion as in the preceding example. In each application the 
DBMS functions occur on separate lines of the program and 
are marked with the leading symbal "NDL-T or "RDL-". The 
NDL-MODULE and NDL-END MODULE Statements in lines 6 and 46 
of Figure 8, and the RDL-MODULE and RDL-END MODULE State- 
ments in lines 5 and 35 of Figure 9, are used by the prepro- 
cessor to produce a DBMS module satisfying the syntactic re- 
quirements specified by the database language. The NDL- 
CORRESPONDENCE or RDL-CORRESPONDENCE Statement in line 8 of 
Figure 8 and line 10 of Figure 9 gives a correspondence 
between programming language variables and DBMS data types. 
This correspondence allows the preprocessor to declare 



parameters with DBMS types for the procedures contained in 
the DBMS module. As noted above, this data type correspon- 
dence is automatic when the user is dealing with the speci- 
fied- standard languages and the selected data types from 
those languages? all other correspondences would be defined 
by the implementor or specified by a subsequent standard. A 
method of generating names for *the procedures is 
implementor-def in«d. All „ other ° NDL or RDL-stateihents in 
both examples fdllow exactly the syntax^ specif ied for state-- 
ments in the database procedure language. . 

* 

Figure 8 shows„ syntax that modifies the salaries in the 
sample network database. The sample program initiates a da- 
tabase s.ession over the PERSOf^NEL subschema, declares a 
keeplist, retrieves all EMP records representing employees 
whose current annual salary in thousands is less than twice 
their current age, stores these employees in the keeplist, 
and then retri^eves from the keeplist to print the employee's 
name and to modify the database by giving each such employee 
a $5,000 raise. Lines 12 and 13 bind the DBMS TEST parame- 
. ter to the RESULT variable and the DBMS STATUS parameter to 
the STATCODE variable* Line 16 tests to see if the singular 
set ALPHA contains any employee records as members. The 
result of the test is passed to the RESULT variable. Lines 
18 through 26 navigate the sequence of member recv>rds of the 
singular set ALPHA. Any exception conditions are passed to 
the STATCODE variable. In line 24, records that satisfy the 
condition for salary modification are stored in the keeplist 
TEMP. Lines 30-41 retrieve records from the keeplist, print 
the employee name, and modify the salary data item. The 
keeplist is maintained entirely by the DBMS so the program- 
ming language ne^d never define a data type for database 
keys. 

Figure 9 achieves the same salary modifications over 
the sample relational database. The sample program ini- 
tiates a database session over the relational database, de- 
clares a cursor with a select expression that identifies the 
desired employee records, 'and indicates that the identified 
records are to be modified by subsequent statements. Line 
14, binds the DBMS STATUS parameter to the RDLCODE variable. 
Line 16 activates the cursor defined in line 5; lines 19-20 
and 26-27 access *-the selected records one at a time and move 
values of their ' cplumn attributes into specified program 
variables. Lines 23-24 carry out the actual salaiy modifi- 
cations of the selected records. Lines 29-34 check the 
STATUS parameter to see if any error has pccurred and^roll 
the database back oi; commit the changes to concurrent appli- 
cations as appropriate. 
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Sample Program to Modify a Network Database" 
Using Implicit Procedure Calls 



Declare EMPNAME PIC X(15) 



MDL-MODULE SUBSCHEMA PERSONNEL IN COMPANY 

KEEPLIST TEMP ; 
NDL-CORRESPONDENCE 

. EMPNAME CHARACTER 15 

EMPAGE FIXED 3 0 

EMPSAL FIXED 8 2 

RESULT TEST 

STATCODE STATUS ; 
Begin Main Program 
NDL-READY EMB .PROTECTED RETRIEVE ; 
NDL-TEST SET EMPTY ALPHA ; 
If (RESULT « "1") Then STOP j 
NDL»FIND FIRST IN ALPHA FOR RETRIEVE ; 
While (STATCODE » "00000") Do 
NDL-6ET EMP 

SET EMPAGE TO AGE ) 

SET EMPSAL TO SAL ; 
If (EMPSAL < 2AEMPAGE*1000} Then 



NDL-FIND NEXT IN ALPHA FOR RETRIEVE ; 
Endwhile ; 

If (STATCODE <> "00100") Then 

Begin Print "ERROR"; STOP End; 
NDL-READY EMP PROTECTED UPDATE ^ 
NDL-PIND FIRST TEMP FOR UPDATE; 
While (STATCODE « "00000") Do 

NDL-6ET EMP 

SET EMPNAME TO NAME 
SET EMPSAL TO SAL ; 

EMPSAL <~ EMPSAL + 5000 ; 

Print ^EMPNAME ; 

NDL-MODIFY EMP 

SET SAL TO EMPSAL ; 

NDL-FREE FIRST TEMP ; 

NDL-FIND FIRST TEMP%OR UPDATE ; 
Endwhile ; 

If (STATCODE <> "00100") Then 

Begin Print "ERROR"; STOP End; 
NDL-COMMIT ; 
NDL-FINISH EMP ; 
NDL-END MODULE 
End Main Program. 



EMPAGE 
EMPSAL 
RESULT 



PIC 999 
PIC 9(6)V99 
PIC(l) 
PIC 99999 ; 



STATCODE 



NDL-KEEP CURRENT IN TEMP 



Figure 8 
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Sample Program to Modify a Relational Database 
Using Implicit procedure Calls 



11. 
12. 
13. 
14. 
15. 



1. Declare EMPNAME PIC X(15) 

2. EMPA6E PIC 999 

3. EMPSAL PIC 9(6)V99 

4. RDLCODE PIC 9(5) ; 

5. RDL-MODULE 

6. RDL-CORRESPONDENCE 

7. EMPNAME CHARACTER 15 

8. EMPAGE FIXED (3,10) 

9. EMPSAL FIXED (8,12) 

10. RDLCODE STATUS ; 
Begin Main Program 
RDL-OPEN RAISEPAY CURSOR FOR 

SELECT NAME, AGE, SAL 
FROM EMP 

15. WHERE SAL < 2*AGE*1000 

16. FOR MODIFY OF SAL ; 
\7. If (RDLCODE <> 0) Then 

18. Begin Print "DB-ERROR" ; STOP End ; 

19. RDL- FETCH RAISEPAY 

20. INTO EMPNAME, EMPAGE, EMPSAL ; 

21. While (RC'ODE = 0) Do 

22. Print iL^l.NAME ; 

23. RDL-MODIFY EMP 

24. SET SALxEMPSAL+5000 

25. WHERE EMP IS CURRENT OF RAISEPAY ; 

26. RDL-FETCH RAISEPAY 

27. INTO EMPNAME, EMPAGE, EMPSAL ; 

28. Endwhile ; 

29. If (RDLCODE <> "00100") Then 

30. Begin 

RDL-ROLLBACK ; 
Print "DB-ERROR" ; STOP 



30. 
31. 
32. 

33. End 

34. RDL-COMMIT ; 

35. RDL-END MODULE 

36. End Main Program. 
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The ch£e£ advantage o£ the implicit procedure approach 
is the convenience to the end programmer of being able to 
write self-contained application programs. Specifications 
for the preprocessor would define how NDL or RDL-MODULE 
statements and NDL or RDL-CORRESPONDENCE statements produce 
the resulting DBMS module with proper parameters for DBMS 
procedures. The chief disadvantage of this approach is that 
at present there is no standard specification for forming a 
database module from an application program containing em- 
bedded database statements; procedures could be defined in a 
number of different ways resulting in potentially different 
error messages. Another drawback is that the'application 
program variables are logically separate from the database 
parameters, thus requiring explicit SET clauses in database 
statements involving parameter passing between the DBMS and 
the accessing language. In the absence of a standard 
specification for a preprocessor, the explicit procedure ap- 
proach given above circumvents the chief disadvantage. In 
some cases, an integrated approach where DBMS function syn- 
tax is combined with programming language syntax, and data- 
base data items are uniquely identified with programming 
language variables, may be desirable. An example of such an 
approach is given in the next section. 



3.5 Native Syntax 

This approach for programming language interface to a 
database requires specifications for invoking DBMS functions 
directly as part of the programming language. Syntax for 
calling these functions is integrated into existing syntax 
of the language, thus requiring modifications to the pro- 
gramming language compiler for handling new verbs, new con- 
ditions, and possibly new reserved words. The programming 
language designers may choose to integrate database excep- 
tion conditions into the normal exception handling capabili- 
ties of the language, or to incorporate database test state- 
ments into the normal control statements of the language. 

Figure 10 is an example program over our network data- 
base for achieving the same employee salary modification as 
in the preceding examples. It is a COBOL program with in- 
tegrated DBMS functions written in proposed COBOL DNL syntax 
[X3J483]. The SUB-SCHEMA SECTION of each program names the 
subschema to be processed, names the record types and set 
types used as COBOL DML parameters in the program, names any 
keeplists, and provides a one-to-one correspondence between 
subschema records and data items and COBOL records and data 
ite. The' effect of the data item correspondence is that 

each execution of a COBOL DML GET, MODIFY, or STORE 





Sample Program to -Modify a lletwork Database 
Using COBOL and COBOL DML Syntax 



/ 1. 

2. 
3. 
4. 

7. 
8. 
9. 
10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 
19. 
^20. 
21. 
22. 
23. 
24. 
25. 
26. 
27. 
28. 
29. 
30. 
31. 
32. 
33. 
34. 
35. 
36. 
37. 
38. 
39. 
40. 
41. 
42. 
43. 
44. 
45. 
46. 



IDENTIFICATION DIVISION. 

PROGRAM- ID. MODSAL. 

ENVIRONMENT DIVISION. 

DATA DIVISION. 

SUB-SCHEMA SECTION. 

DB PERSONNEL WITHIN COMPANY. 

LD TEMP. 

RD RECORD SECTIQN. 
01 EMP. 

0^ NAME PIC X(i5) . A 
02 AGE PIC 999. 
02 SAL PIC 9(6}V99. 
SD ALPHA. 

WORKING-STORAGE SECTION. 

77 LASTREC PIC X(3) VALUE ^NO^ . 

PROCEDURE DIVISION. 

DECLARATIVES. 

LASTREC-ERROR SECTION. 

USE FOR DB-EXCEPTiON ON ^00100^. 

MOVE "YES'' TO LASTREC. 
ABNORMAL-ERROR SECTION. 

USE FOR DB-EXCEPTION ON OTHER. 

DISPLAY "DB ERROR". STOP RUN. 
END DECLARATIVES. 
MODI FY-DATABASE . 

READY EMP USAGE-MODE IS PROTECTED RETRIEVAL. 

IF ALPHA IS EMPTY STOP RUN. 

FIND FIRST EMP WITHIN ALPHA. 

PERFORM GET-REC UNTIL LASTREC • "YES". 

END-PERFORM. 

R^ADY EMP USAGE-MODE IS PROTECTED UPDATE. 
FIND FOR UPDATE FIRST WITHIN TEMP. 
MOVE "NO" TO LASTREC. 

PERFORM MODIFY-REC UNTIL LASTREC » "YES". 
COMMIT. FINISH EMP. STOP RUN. 
GET-REC . 

GET' £MP • 

IF SAL < 2*AGE*1000; KEEP CURRENT USING TEMP. 
FIND NEXT EMP WITHIN ALPHA. 
MODIFY-REC. 
GET EMP. 

ADD 5000 TO SAL. DISPLAY NAME OF EMP. 
MODIFY EMP. 

FREE FIRST WITHIN TEMP. 
FIND FOR UPDATE FIRST WITHIN TEMP. 
END-PERFORM. 



Figure 10 
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statement is equivalent to the corresponding NDL Statement 
with implicit SET clauses for transferring data values 
between a database record and a COBOL record. 

The type declaration for each data item in the SUB- 
SCHEMA SECTION must match the type declared for that item in 
the schema. Lines 17-24 of Figure 10, DECLARATIVES, pro- 
vide for automatic processing of database exception condi- 
tions so that the programmer need not check the' status re- 
gister after each invocation of a DBMS function. According 
to COBOL DML rules every possible exception condition must 
correspond to exactly one USE FOR statement. 



4. FEATURES OF THE NEa'WORK AND RELATIONAL MODELS 



Chapters 2 and 3 covered the basic structures and 
operations of the network and relational models, but both of 
them have c^:her features that seem more complex or require 
further discussion. This chapter focuses on such charac- 
teristics as arrays, recursive sets, cursors, nested 
queries, and triggers. It also considers some of the advan- 
tages and disadvantages of each model — i.e., what it does 
well and what it does only adequately or even poorly. 



4.1 The Network Model 

In its original CODASYL form the network model was 
designed for handling data that batch COBOL programs could 
process one record at a time. The current ANSI specifica- 
tion has diverged in a number of ways from its ancestor, but 
the basic orientation toward sequential processing still 
remains. Cursors, keeplists, and session states enable the 
DBMS to keep track of its position as it navigates through 
the database. Though the bias toward COBOL is gone from the 
syntax, many of the same operations are available, and with 
a few exceptions the data structures are the same as always. 



4.1.1 Cursors and Session State. The logical structures of 
a network database may be represented graphically as a col- 
lection of tables showing the record occurrences associated 
with each record type and the set occurrences associated 
with each set type. A table representation of a database 
whose logical structures are defined by the PERSONNEL 
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subschema is given in Figure 11. Bach record is identified 
by a unique identifier called a database key. The database 
key is never directly accessible 43y an end user but is used 
to define the effect of DBMS functions on the session state 
(see Figure 12) during a database session. A set is 
represented by the database keys of its owner and member 
records. In the set type tables of Figure 11, each owner 
database key is associated with a sequence of member data- 
base keys. The order of the sequence represents the order 
of member records within the set. 

The action initiated by a DBMS function is dependent 
upon values of cursors, keeplists, and ready lists contained 
in an associated session state. Figure 12 is a representa- 
tion of the session state specified by the Interpretive 
State section of the current draft proposed NDL for a data- 
base module acting over the PERSONNEL subschema. The ses- 
sion state is prepared by the DBMS prior to execution of the 
first procedure in a database session, and updated whenever 
DBMS statements change the state. 

r 

The cursors identify a single record from each record 
type and a single set from each set type of the subschema. 
Each record type cursor contains the database key of the 
current record of that record type; each set type cursor 
contains the database key of the owner record of the current 
set of that set type together with the database key of its 
current member? and the session cursor contains the database 
key of the current record of the session. A keeplist, which 
is a sequence of database keys that can be used by a pro- 
grammer to save references to individual records, is main- 
tained for each keeplist named in the associated module. 
The ready ' list, which is maintained by the DBMS to help in 
managing shared access to the database, is a list of record 
neunes together with their share specifications. 



4.1.2 Keeplists. As stated above, a keeplist is a sequence 
of database Iceys. Some database statements add database 
keys to the end of a named keeplist, and the user program 
has access to both the front and the end of the sequence. 
Keeplists, which are declared in a module, enable an appli- 
cation program to retain and use database keys for quick re- 
trieval of records to be examined or modified. They there- 
fore prevent the program from having to repeat searches for 
records that will- be used again. In Figure 10, lines 36-39, 
the procedure GET-REC tests the current EMP record and, if 
it qualifies, adds its database key to the keeplist TEMP for 
further work. After the last call to GET-REC, the procedure 
MOD I FY -DATABASE finds the first key in TEMP (line 32) and 
then calls the procedure MODIFY-REC, which modifies the 
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NETWORK DATABASE OCCURRENCES 
IN THE PERSONNEL SUBSCHEMA 



EMP Records 



DEPT Records 



DB-KEY 


NAME 


AGE 


SAL 


1 


Joe 


21 


18K 


2 


Adam 


36 


39K 


3 


Jane 


18 


38K 


4 


Sally 


32 


65K 



DB-KEY 


NAME 


LOC 


5 


MKT 


NY 


6 


MFG 





PAYROLL Sets 



MANAGES Sets 



OWNER 


MEMBERS 


5 


2, 3 


6 


4, 1 



OWNER 


MEMBERS 


« 

2 


3 


4 


1 



ALPHA Set 



OWNER 


MEMBERS 


sys 


2,3,1,4 
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Session State 
For the Network Database 



Session Cursor 

DEPT Cursor 
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record, frees the database key from the keeplist (line -44) , 
finds the next key in TEMP, and then repeats the process for 
the new record. 



4.1.3 Arrays. As noted in Section 2.2.5, arrays have re- 
placed repeating groups in the ANSI specification of the 
network model; None of the examples shows the function of 
arrays, but it is not difficult to envision a situation in 
which they would be useful. Although they are less flexible 
and powerful than sets, they offer some advantages in sim- 
plicity for data of fixed dimensions. For example, a record 
type containing information about computer users might have 
a component named TERMCHARS that would consist of ASCII 
characters whose sequence in the array would define the 
user^s terminal for certain application programs. Keeping 
this information in such a record would avoid the necessity 
of traversing a s^t and locating a separate record with th^ 
same information'. In a scientific database the importance 
of arrays would be even greater. 



4.1.4 Recursive Sets. Recursive sets are simply sets in 
which the owner record type is identical to one of the 
member record types. In Figure 1 MANAGES is a recursive set 
type. Each of its occurrences includes an EMP record as 
owner and the records of all employees who work directly 
under the person identified by the owner record. The word 
"directly" is critical here, since MANAGES sets obey the 
same uniqueness rules as any other set. No EMP record can 
be a member of more than one MANAGES set. Recursive sets 
make it easy to answer a number of questions about the rela- 
tionships between records. 

For example, to get a list of every employee whose im- 
mediate manager is under 30, we use the ALPHA set to examine 
each employee record. If the AGE component is less than 30, 
we add to our result the member records of the MANAGES set 
associated with the current record. When we finish travers- 
ing the ALPHA set, we have our answer. To get a similar 
list of all employees who are older than their immediate 
managers, we traverse ALPHA again, but this time we have to 
traverse every MANAGES set to select those member records 
whose AGE component is smaller than the AGE component of the 
owner record. An alternative form would require traversing 
ALPHA and for each employee record doing a find owner using 
MANAGES to obtain the record for the comparison of ages. If 
the task were to get a list of all employees who have a 
younger manager at any level, we would again traverse ALPHA 
one record at a time. This time, however, we would find the 
owner of the MANAGES set recursively until either the 
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starting EMP record qualified or we reached the tpp of the 
organizational structure. In all of these cases we would be 
taking advantage of existing data structures to navjiga^e ef- 
ficiently through the database. \ 



4.1.5 Singular Sets. Singular sets — sets whose types aXe 
defined with SYSTEM as the owner record type — provide orV 
dered access to member record types. Since there Is only 
one record occurrence of the type SYSTEM, there Is one set 
(hence, "singular") that contains all member records of each 
named type. Singular sets thus provide the same functional'^ 
Ity as record. order keys (one member record type) and .areas ' 
(multiple member record types) in the CODASYL model. If an 
organization needs a record type sorted In several ways, 
then the database administrator simply defines a new singu- 
lar set for each ordering. In the example database shown In 
Figure 1, ALPHA orders EMP records alphabetically according 
to the NAME component and thus supplements the ordering by 
department and salary, defined for PAYROLL sets. Host- 
language programs can use whichever order Is most appropri- 
ate for the application. 

By default, the network specifications provide direct 
access to each record type that is essentially equivalent to 
the existence of a singular set type with that record type 
as Its only member and having system default order. Thus, 
in Figure 1, it would be possible to find SUPERVISORS 
descriptions that satisfy a certain condition by navigating 
directly through the record type without first' nav'lgatlng 
through DEPT and EMP record types. 



4.1.6 Set Insertion and Retention. Set Insertion governs 
the initial Insertion of member records into sets and may be 
AUTOMATIC, MANUAL, or STRUCTURAL. If insertion is AUTOMAT- 
IC, then whenever a new occurrence of the member record type 
is stored in the database, it is automatically connected to 
the set identified in the session state by the set cursor 
corresponding to the set type in which the member record 
type is defined. This is the option chosen for SUPERVISORS 
and STAFF records In the network schema of Figure 4. As 
each new STAFF or SUPERVISORS record is stored in the data- 
base, it is connected automatically to the JOBHISTORY set 
identified by that set cursor in the session state. A set 
cursor may be positioned to the set owned by an EMP record 
simply by finding that record. If insertion is MANUAL, then 
records may be Inserted as member records in a set only by 
an explicit connect statement. This is the Insertion option 
specified for EMP records as members of both PAYROLL and 
MANAGES sets in the example. Finally, for STRUCTURAL 
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insertion, the effect on a new ^member record is similar to 
AUTOMATIC insertion, with the exception that the receiving 
set is not identified by a set cursor. Instead it is 
selected so that its owner record has values for certain 
data components equal to corresponding vailues in the new 
iKf^mber record. For example, if the EMP record type had a 
DEPT component as in the relational example, then it could 
be defined as a structural member of the PAYROLL set type 
with the structural specification defined as DEPT of EMP 
equal to NAME of DEPT. If such were the case, then line 21 
of Figure 6 would not be necessary as the connection would 
take place automatically. 

STRUCTURAL insertion also has an effect on the modify 
statement; if a structural data item of a member record type 
is modified, then it automatically become^ a member of a new 
set of the same.,set type having an owner record that satis- 
fies the structural condition. In the above modification of 
the network examp:|.e this would mean that any modification of 
the DEPT component in an EMP record would trigger a recon- 
nection of that employee to a DEPT record with a matching 
department NAME. In all of the above situations, the 
sequential position of the new member record in its set is 
determined by the ordering criteria defined for that set 
type. 

Set retention governs the <^ disposition of member records 
after they have been inserted into some set. Retention may 
be FIXED, MANDATORY, or OPTIONAL. For FIXED retention, a 
member record remains a member of the set of first insertion 
for its entire lifetime in the database. This is the option 
chosen for SUPERVISORS and STAFF records in the network 
schema of Figure 4. This declaration has the effect of 
linking job history descriptions permanently to a specific 
employee; if an employee record is erased from the database^ 
then all job history information pertaining to that employee 
is erased at the same time. 

Whenever MANDATORY retention is specified, a record, 
having once become a member record of a set, remains a 
member of some set of the same set type until it is erased 
from the database. This is the retention option specified 
for EMP records as members of PAYROLL sets in the network 
example. The effect of this declaration is that once an em- 
ployee record io first assigned to some department it must 
remain forever assigned to that department or to some other 
department in the database; it can never become unattached. 
Of course, since insertion is MANUAL, a new employee record" 
remains unattached to any department until it is first in- 
serted by a specific connect statement. Because of MANDATO- 
RY retention for employees, no department record may be 
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erased from the database without first erasing, or cdnnect- 
ing to other departments, all employee records in the PAY- 
ROLL set owned by that department. If desired, such erasure 
could be effected with a single erase statement using the 
full cascade option. In that situation all employees con- 
nected to the erased department would be erased » and the ef- 
fect of that erasure would cascade to the job history and 
all other records subordinate to the erased department 
record. 

With OPTIONAL retention, a member record may remain a 
member of the set it was connected to, or it may be removed 
either for reconnection to some other set or to become unas- 
sociated with any owner record. This option, is specified 
for the MANAGES set type in the example. Thus an employee 
may or may not have a manager recorded in the database, 
depending on whether or not ^he EMP record is connected via 
some MANAGES set to another employee. Under OPTIONAL reten- 
tion, if an^ EMP record is erased, then the effect may or may 
not cascade to subordinate employees, depending on whether 
the erase statement specifies full or partial cascade. 

4.1.7 Set Ordering. Set ordering is declared in the schema 
for each set type; it specifies a sequential ordering for 
the member records of each set. The ordering may be SORTED 
by component values, in which case additional KEY clauses 
specify the sort key and direction, or it may be one of the 
following: chronological* order as FIRST or LAST when com- 
pared with previous insertions; relative positioning as NEXT 
or PRIOR to a given member record; or system DEFAULT order 
^determined by the implementation. The schema in Figure 4 
shows only' the SORTED option for each member record type, 
but if JOBHISTORY member records were inserted chronologi- 
cally with respect to former job positions, then the effect 
o£ order FIRST (i.e. most recent insertion put first) would 
be equivalent. Likewise, in the ALPHA set for employee 
records, if each new employee record were inserted in its 
correct alphabetical position, then order NEXT or order PRI- 
OR woUld be equivalent to order SORTED by ascending NAME. 
System DEFAULT order, which allows the DBMS to optimize re- 
trievals without having to provide a specified order, will 
be the option of choice when no particular order is needed 
by users. The only requirement for the DBMS in DEFAULT order 
is that it be reproducible within a single transaction; 
between transactions the implementation may reorder the 
member records of any such set. 



Even in multiple roemBer set ti'^^s, like TTOBHISTORY in 
±he example, there is only a single sequential order for the 
member records of each set. ^ In that example, the <SUPERVI- 
SORS and STAFF member recpcds might be intermixed since 
STARTDATE is a gommon key item. If the database administra- 
tor wished - to partition job history descriptions by job 
category - supervisor or staff — then those *member record 
types would be ordered by an optional declaration in the 
SORTED clause. 



4.1.8 Complex Data Structures. As the previous sections 
have demonstrated, the network model has a variety of com- 
plex and powerful data structures. While the use of 
p°ointers like . dataUbase keys, enables th^ DBM^ to navigate 
through a database efficiently, the complexity of sets, cur- 
sors, keeplists, and other structures makes €he network 
model better suited to prQ<duction applications than to ad 
hoc data manipulation. Many people who have little or no 
programming experience could find the data structures of the 
network model difficult to learn and use. Such complexity 
can also make it expensive to redefine data structures. 
With so much information^ already encoded, the network model 
probably works best ^or "static" databases, where structures 
and programs that navigate through them have a relatively 
long life span. The values of the database could change 
frequently, but the structures should probably be stable. 



4.2 The Relational Model 

From its beginning the relational model has developed 
independently of any programming language. Its theoreti- 
cians and developers have consistently focused on the needs 
of interactive users who might prefer not to write programs 
in host language. This focus remains in the current ANSI 
draft, though of course the specification does provide for 
interfaces with traditional programming languages. The re- 
lational model has repeatedly emphasized two features: the 
simplicity of presenting data in tabular form, and the 
mathematical soundness of table definition and manipulation. 
Each of the following sections discusses either a feature of 
the model or some Consequences of its mathematical elegance. 



4.2.1 Nesting and Range Variables. Relational DBMS's offer 
two complementary techniques for answering recursive queries 
that the network model would handle with recursive sets. 
The first of these involves variables that range over speci- 
fied tables and enable the user° to make comparisons such as 
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X.NAME B y. MANAGER.- In the current draft of the ANSI rela- 
tional specification these variables can be implicit — the 
table names themselves — or explicitly declared synonyms 
with a limited scope. A second technique involves nested 
queries, in which the WHERE clause of one °query includes a 
sub-query. The result of the sub-query must be a table with 
exactly one column. 

For example, a user of the sample database could .find 
all the employees whose immediate managers ajre younger than 
30, by running the query in figure 13(a). Though the imple- 
mentation of this query could vary, the logic would remain 
the same. The sub-query returns a one-column table listing 
the names of all employees under 30. The, main query then 
examines every row of EHP and selects the employee name and 
manager whenever the manager is in the table of names re- 
turned by the sub-query. Logically, a relational t}BMS would 
have to. retrieve and examine every row in -twice, once 
for the sub-query and once for the quejry. For , n roWs in 
^P, the execution time would be proportional to 2n. Of 
course, an index on AGE could substantially reduce the exe- 
cution time of the sub-query, but execution time for the 
whole query would ° still be 0(n). ^ 

In thd' ^previous example the predicate within the sub- 
query was absolute — i.e.,^ it involved a column value and a 
constant. When the predicate becomes relative, as in Figure 
13(b), the query is morfi complicated. In this case we .want 
to know the name and manager of every employee who is older 
than his manager. Logically, the DBMS must retrieve each 
row of EMP and then search EMP.^gain for the row whose NAME 
column matches the MANAGER column of the retrieved row. If 
the result of the sub-^uery is more than a single value, it 
is an error. For n rows in EMP, the execution time is pro- 
portional to n**2, and an index on AGE is Unlikely' to itn- 
prove the speed. 

Both of the previous queries involved a single nested 
sub-query. The relational specification allows nesting, to 
any level, but that level must be fixed in a given query. 
In the current draft of the standard, there is no way to 
formulate a query that will return the names of all employ- 
ees who are older than at least one of their .managers, im- 
mediate or higher. Because the specification omits control 
structures, a user who needs the answer to this question 
must write a program that climbs the organizational tree im- 
plied by the MANAGER column in the EMP table. A query in 
the relational data manipulation language might require two 
levels of nesting for some employees and five levels for 
others. A host language program, however, can retrieve rows 
one at a time and store in its own data structure^ the 
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Nested Queries in the Relational Data Language 



SELECT NAME, MANAGER 

FROM . EHP 

WHERE EMP. MANAGER IN 



(SELECT 
FROM 
WHERE 



NAME 
EMP 

EHP. AGE < 30) 



(a) 



SELECT NAME, MANAGER 
FROM EMP AS X 

WHERE X.AGE > 



(SELECT 
FROM 
WHERE 



AGE 

EMP AS Y 
X. MANAGER 



- Y.NAME) 



(b) 



Figure 13 
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necessary data to answer the query. 



4.2.2 Views* A view in the relational model is analogous to 
a subschema in the network model — i.e., it defines a por- 
tion of the database as it will appear to particular users. 
Each view is a derived table because it results from the ex- 
ecution of a statement, «namely a SELECT statement; it 
differs from other derived tables because the SELECT state- 
ment (actually, the <query 3pec>) that defines it is a per- 
manent part of the database rather than just an operation 
during a user session. If the <query speo selects columns 
from only a single table without a sub-query, then the 
resulting view is updatable; otherwise, it is read-only. A 
database adihinistrator may wish to use the CREATE VIEW 
statement to establish particular views of the data for 
users with similar privileges or duties. 

, v' 

For example. Figure 14 shows the syntax for defining a 
view of the example relational database that would be suit- 
able for employees who need information about the current 
position of both supervisors and staff. Here the <query 
speo is the union of two SELECT statements, each of which 
joins the EMP table with one of. the two tables containing 
the job history of current employees. For each employee 
only one row from either SUPERVISORS or STAFF will satisfy 
the condition that ENDDATE be null. The view definition 
will join the POSITION and STARTDATE columns from that row 
with the row from the EMP table thut has the same value in 
the NAME column, and the result will be a derived table with 
the columns shown in the figure. Because the view results 
from more than a single table, it is not directly updatable. 
It could, however, be used as a source for comparisons with 
one of the base tables, which could then be updated by users 
with appropriate privileges. 



4.2.3 Triggers. Because relational databases depend so 
heavily on operations rather than structures to maintain re- 
lationships among data values, they invariably have a high 
degree of dajta redundancy. Frequent modif ica'iion of data 
values that occur in several tables could result in an in- 
consistent database. To avoid this problem, the current 
ANSI specification provides referential integrity, which is 
a technique for tying together corresponding data from dif- 
ferent tables. In the example database defined in Figure 5, 
for example, modification of an employee's name or deletion 
of an employee's record from the EMP table would cascade 
through to the SUPERVISORS and STAFF tables. Any rows in 
those tables that refer to modified rows in the EMP table 
automatically undergo the same modification on the 
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Creating a View in the Relational Data Model 



CREATE VIEW 
AS 



EMPJOBINFO 



SELECT EMP.ALL, SUPERVISORS. POSITION, 
SUPERVISORS . STARTDATE 



FROM 
WHERE 



EMP, SUPERVISORS 

SUPERVISORS. NAME " EMP. NAME 

AND SUPERVISORS :ENDDATE « NULL 



UNION 

SELECT EMP.ALL, STAFF. POSITION, STAFF. STARTDATE 

EMP, STAFF 



FROM 
WHERE 



STAFF. NAME - EMP. NAME 

AND STAFF. ENDDATE - NULL 



t 

EMPJOBINFO 




NAME 


AGE 


SAL 


DEPT 


MANAGER 


POSITION 


STARTDATE 
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corresponding columns. Any rows referring to deleted EMP 
records are also deleted. 

The EMP table itself is in turn tied by a referential 
constraint to the DEPT table. Modifications to the NAME 
column in the DEPT table cascade through to the DEPT column 
in the EMP table. Deletions, however, work differently. 
The RESTRICT ERASE constraint ' on the DEPT column in the EMP 
table prohibits users from deleting department rows without 
first modifying or deleting any corresponding rows in the 
EMP table. In thi.s case the constraint actually works in 
the opposite direction, preventing a modification that would 
result in an inconsistent database. 



4.2.4 Flexibility. By relying on operations rather than 
structures for much of its semantics, the relational model 
makes possible a flexibility that' neither the network model 
nor any other structure-oriented model can match. Users who 
have either a "dynamic" database or frequent ad-hoc requests 
for specialized information may find this flexibility very 
appealing. The current ANSI specification makes modifying 
and retrieving data very simple in a large number of cases, 
yet still enables a user to pose elaborate, complex queries 
without incurring the overhead of a complex host language 
program. In many environments such flexibility may match 
exactly the needs of the user. However, the reliance on re- 
trieval by value rather than by logical pointers will prob- 
ably kec.'p the relational model from performing as well as 
the network model on "static" databases with infrequent 

structural changes but a lot of data processing*. 

/ 

/ 

/ 

4.2.5 Mathematical Precision and Elegance. Since Codd's 
original formulation the people who have done research on 
the relational model have in general preserved its firm 
mathematical _bas is. The builders of experimental and com- 
mercial systems have followed suit, though they have usually 
found it convenient or expedient to give up the elegant but 
sometimes impractical view of a table as a set of rows with 
no duplicates allowed. In its current \draft the relational 
standard acknowledges the practical importance of allowing 
duplicate rows. It does insist, however, on another impor- 
tant feature of the model, the closure of tables under all 
operations on them. This feature ensures that the result of 
any relational operation is always another relation, or 
table. No matter how simple or complex a SELECT statement 
is, its result will still be a table and hence will be con- 
ceptually simple. 



ERIC 



-50- 

58 



For many users the relational data language will seem 
straightforward for most operations. Its English-like ver- 
sion of the predicate calculus hides some mathematical con- 
structs that can confuse a user who is unfamiliar with the 
language of logic and mathematics. The full power of the 
predicate "tsalculus, however, still shows up in the WHERE 
clause, which specifies criteria for qualifying or selecting 
rows from appropriate tables.' Because this clause is power- 
ful and elegant enough to meet the needs of sophisticated 
users, it may sometimes prove difficult for less experienced 
people to learn and use. .Database administrators may wish 
to construct menu-driven query capabilities for the most 
common requests .by inexperienced users* 



4.2.6 Performance. In „ its early experimental implementa- 
tlons the relational model gained a reputation for poor per- 
formance* Queries could take longer for the DBMS to answer 
than a user would need to search through printed output. 
Commercial implementations have improved performance consid- 
erably, often by adding indexes on columns that are fre- 
quently used in retrievals or by clustering values to optim- 
ize relational joins. Indexes and clustering may add a lot 
of overhead to relattional DBMS^s, but they can reduce 
response times to acceptable levels. In essence, an index 
on a column provides the relational model with pointers that 
are similar in function to those of the network model. 
Without these pointers the relational model must do a lot of 
searching to answer even some simple queries, and physical 
storage decisions become important factors in performance. 
Clustering often has the same physical characteristics as. 
the implementation of network model sets. Because indexes 
and clustering affect only the performance of a DBMS and not 
its functionality, they are omitted from the standard and 
left to individual implementors. 



5. SELECTION ISSUES 



Choosing the right data model for a particular set of 
application requirements is almost certainly the first and 
most import" nt step in the selection process. But it is not 
the las.. This chapter discusses some of the other aspects 
of a DBMS that the buyer should also consider before actual- 
ly purchasing a system. Different vendors may vary enor- 
mously in the markets they try to reach and in the features 
they offer their customers. Those features include user in- 
terfaces, support software, compatibility with hardware and 
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operating systems, and customer services. Only the poten- 
tial buyer can decide which commercial product \ best meets 
the needs of the users" applications. 



While the ANSI specifications of the network and rela- 
tional models describe in detail available data structures 
and operations, they do not establish or even recommend 
values for a number of critical system parameters. A user 
of a relational system may want to add the requirement that 
the DBMS support at least a given number 6f columns per 
table, rows per table, tables per database, or even data- 
bases per operating DBMS. Similarly, a, network user may 
need a minimum number of components per record type, records 
per record type, record types per databas^fe, sets per set 
type, or set types per database. Identifying appropriate 
values for these parameters could be the difference between 
meeting application requirements and buying khe wrong pro- 
duct for the job. Understating the requirements could 
result in a system that will quickly be outgrown, and over- 
stating them could result in an unnece.ssarily slow or awk- 
ward system. 

There are also a number of parameters that are indepen- 
dent of data model. One application may require either very 
long or variable length character strings, while another may 
require either a given level of precision in real numbers or 
a large maximum for integer values. Some applications may 
require specialized data types like bit strings to represent 
video pictures or enumeration types to represent finite sets 
of alternatives. In some environments it may be critical 
for the DBMS to allow at least a given number of simultane- 
ous users, either of the DBMS or of a particular database. 



5.2 Hardware and Operating System Support 

For many organizations one of the primary factors in 
selecting a DBMS may be compatibility with particular 
hardware and a specific operating system. In that case the 
purchaser should make sure that prospective products meet 
all of the organization's requirements in the existing 
hardware and software environments. Even the, same commer- 
cial product may vary in important ways from one implementa- 
tion to another. If the organization is procuring a comput- 
er, an operating system, and a DBMS simultaneously, or if it 
may soon change hardware or operating systems, then the 
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portability of prospective DBMS^s is an issue. A product 
that runs on a variety of machines may give the customer im- 
portant flexibility (e.g., in converting to another DBMS or 
to a distributed database) that could outweigh the immediate 
performance or usability of the DBMS. 

Besides the possible requirement of compatibility with 
particular brands of hardware and software, users have to 
consider such features as the memory requirements of the 
DBMS. A microcomputer with only 64K of memory may not be 
suitable for a DBMS that requires 48K just for its execut- 
able image. Similar problems could occur on larger 
machines, where the DBMS might take up enough main memory to 
hamper its own operation or the operation of other programs 
running under the same operating system on the same, proces- 
sor. Since many DBMS^s work with large databases, a pros- 
pective buyer may need to find a match between existing disk 
capacity and the secondary storage requirements of a partic- 
ular product. DBMS^s may vary considerably in the space 
overhead necessary for efficient performance. A customer 
needs to know whether a prospective DBMS can share a disk 
with other files, and whether the DBMS and operating system 
are compatible in their handling of files on disk. If effi- 
cient operation of the DBMS requires that files be laid out 
in contiguous sectors on disk, then either the operating 
system must allocate disk space that w^y or the DBMS must 
handle its own I/O. 



5.3 Backup and Recovery Facilities 

Establishing and operating a database are expensive and 
ti,me-consuming projects, and the DBMS must provide adequate 
means to protect the user^s investment. No matter how good 
the DBMS, the support software, and the computer system are, 
they will suffer occasional system crashes. Every DBMS 
should have some strategy for protecting the database 
against these failures. At the very least the user should 
demand a convenient way to backup the entire database in 
case of a disastrous loss of on-line data. Weekly full 
backups supplemented by daily incremental backups are one 
example of a backup strategy, but losing even a day's tran- 
sactions on a busy database could be costly* Some products 
provide a journaling facility that keeps a record of all 
transactions since the last backup. The journal file may 
even be maintained on a separate disk to avoid its being 
scrambled at the same time as the database. In the case of 
minor crashes the DBMS can simply repeat all the transac- 
tions logged since the last successful update of the data- 
base on disk. For individual users some DBMS's, including 




the proposed standards, provide commit points that enable a 
user to specify a series of database executions as a single 
transaction that Is either completed successfully or not 
done at all. Further guidance on protecting data files Is 
/available In other publications available from the Institute 
for Computer Sciences and Technology (ZCST) within the Na- 
tional Bureau of Standard^ (NBS) [FIPS79, FlPSSla] . 



5.4 Bulk Loading and Unloading 

While a number of databases may have to be keyed in at 
a terminal, the user will often want to prepare data files 
without invoking the DBMS or to convert existing data files 
into a format suitable for it. Bulk loading of a DBMS from 
data files is probably a requirement for most production ap- 
plications. A flexible loading utility that can handle a 
variety of input file formats could certainly enhance the 
attractiveness of a particular product. In some cases up- 
grades of the DBMS or even minor changes in the schema de- 
finition may require dumping the entire database and reload- 
ing it after the change. Under those circumstances, of 
course, bulk loading and dumping utilities would be a neces- 
sity. 



5.5 Schema Manipulation 

A number of DBMS's offer their users operations or 
techniques for redefining data structures without having to 
dump and reload the entire database. Adding a column to a 
table or a field to a record is the type of schema manipula- 
tion an organization may need as it expands the kinds of 
data that it collects and uses. A slight change in a law or 
regulation may force an agency to keep more complicated or 
detailed records of its work. Potential customers should 
consider not only the current requirements of applications 
sharing a database, but also the likelihood that the data 
structures may change with time. 

While on some DBMS's the facilities for redefining a 
schema may consist of only a few commands, on others there 
may be a fully Integrated data dictionary to record and con- 
trol schema changes. In still other environments the data 
dictionary may stand entirely alone or work with the DBMS as 
a separate but related product. A data dictionary can help 
a database administrator manage a database by identifying 
relations among structures, users of particular subschemas, 
or programs and applications that depend on specific data 




structures. Both ANSI and ICST are now working to produce a 
standard data dictionary system. Such a tool can make mani- 
pulation of a schema safer « but of course It Imposes addi- 
tional costs and overhead on data management. W!iether or 
not a data dictionary is appropriate for a particular organ- 
ization depends on the nature of both the database and the 
database administrator'* s strategy for controlling the data. 



5.6 Access Control 

Most applications and user environments require some 
form of control over access to data. In many cases a DBMS 
will offer a subschema or view facility that enables the da- 
tabase administrator to restrict users to viewing only the 
data that are necessary for their work. The extent of this 
control could vary considerably from one system to the next. 
Potential customers must understand both application re- 
quirements and DBMS products well enough to know which of 
the following questions are relevant to their plans: 



1. Does access control extend to the level of records or 
fields? 

2. Does the product provide access control based on data 
values or only on data structures? 

3. Can users be denied the use of certain operations on 
the data? 

4. Can users update records or tables if they have ac- 
cess only to certain fields or columns? 

5. Does the database administrator have sufficient con- 
trol over access rights to tailor users" privileges 
to match their needs? 

6. How easy is it to change access rights to reflect 
changes in user needs? 

7. Are access rights attached to the user or to the 
data, and which way better suits application require- 
ments? 



Whether it is Implemented through a subschema facility or 
through some other mechanism, access control is an important 
consideration for almost any organization that wants 
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centralized data management. 



5.7 Concurrency Control 

Sharing data will almost inevitably cause conflicts 
between users who want access to the same data at the same 
time tPIPSBlb] . Fortunately, research on concurrency con- 
trol in pperating systems applies directly to problems in 
common databases. Anyone considering a DBMS £or multiple 
users should make sure that the product has adequate 
deadlock prevention or at least detection. A system with 
only deadlock detection should provide a means for rolling 
back the transactions of all blocked processes and giving 
one of them the necessary priority to complete the 
deadlocked transaction. If the DBMS simply grinds to a halt 
when it detects a deadlock, users may find themselves wast- 
ing valuable time whenever they want to update tne database. 

The proposed NDL and RDL standards handle concurrency 
by defining the notion of a transaction, which is a sequence 
of operations (statement executions) that is atomic with 
respect to recovery and concurrency. A transaction ter- 
minates with commit or rollback. If it terminates with 
rollback, then all changes that it made to the database are 
canceled. If it terminates with commit, then the changes 
become part of the database. Committed changes cannot be 
canceled, and changes made to the database by a transaction 
cannot be perceived by other transactions until such time as 
that transaction terminates with commit. The execution of 
concurrent transactions is guaranteed to be serializable, 
which means that the execution of the operations of con- 
current transactions produces the same effect as some serial 
execution of thosie same transactions. A serial execution is 
one in which each transaction executes to completion before 
the next transaction begins. Serial izable execution of 
transactions implies that all read operations are reproduci- 
ble within a transaction, except for changes made by the 
transaction itself. 



5.8 Access Languages 

Users of a database may demand specific languages for 
their applications, and the purchaser of a DBMS should cer- 
tainly try to match those demands against the features of 
competing products. If an organization has an absolute re- 
quirement for database access from FORTRAN, COBOL, and Pas- 
cal programs, then it must make sure not only that the DBMS 




allows such access, but also that the planned environment 
has suitable compilers for those languages. I£ a particular 
product allows access from any language that can call 
separately compiled procedures, but runs only under an 
operating system for which no one has written a COBOL com- 
piler, then it cannot meet the organization" s needs. , 

Besides access from traditional programming languages, 
users may want even higher-level languages that enable them 
to retrieve data quickly without going to the trouble of 
writing a program. Navigational systems usually offer fa- 
cilities that allow ad-hoc retrieval of data for queries or 
reports. Relational systems generally offet a query 
language based on either the relational algebra or the rela- 
tional calculus, and many products based on other models 
offer similar languages. Because of their precision and 
elegance, these languages appeal to sophiaticated users like 
the people who have to select and administer a DBMS. But 
.for the typical nonprogrammer a language based on predicate 
calculus is probably unsuitable. For these users many ven- 
dors offer powerful user languages and report writers that 
simplify access to the database. One vendor "s user language 
reportedly offers a 96 percent savings in code and a 75 per- 
cent savings in effort relative to COBOL [DRAP81] . Though 
there are no candidate standards for user languages and re- 
port writers, some organizations may consider them very at- 
tractive features of a prospective DBMS. Users should be 
aware, though, that reliance on one vendor "s reporting 
language for building an application means either total 
dependence on that vendor through the application's life cy- 
cle, or expensive rewriting in another language. 



5.9 Display Features 

While a wide range of access languages may be essential 
to some DBMS customers, it might not be enough. A number of 
companies offer attractive application packages that take 
advantage of the display features of intelligent terminals. 
Such packages can make database management substantially 
easier, especially in an environment where a number of data- 
base users have little or no programming experience. A 
forms processor capable of reading a screenful of data at a 
time can eliminate some of the syntactic errors that inevit- 
ably occur in writing a program, no matter how simple the 
language. Of course, such a powerful tool may require 
equally powerful protective mechanisms to ensure the in- 
tegrity of the database. 



5.10 Support, Training, and Documentation 

Most o£ the features mentioned so far have to do with 
the DBMS itself. In procuring a software system as complex 
as a DBMS, the customer must also consider the quantity and 
quality of services provided by the vendor. Few, if any, 
user organizations can handle their own software support for 
a proprietary package, and training and documentation are 
essential. Potential customers should determine the length 
and coverage of vendor warranties, the cost of continued 
software support and system upgrades, and the skill and 
responsiveness of the vendor^s technical support staff. 
They should also iind out as much as possible about training 
courses and examine system documentation, including on-line 
help facilities. The quality of these customer services 
could greatly affect the ease of using and maintaining a 
DBMS. 



5.11 Existing User Base 

For many products as complex as a DBMS there are na- 
tional and local users" groups that can offer additional 
support to new customers. Although frequently sponsored by 
the vendors themselves, these. groups generally express the 
interests of the users. Through organ^ization customers can 
often achieve effective, reliable. response to their needs 
for bug fixes, design changes, or enhancements in future 
releases of a system. . Furthermore, other users of a partic- 
ular product may have developed similar application packages 
or local programs that can significantly decrease the time 
and cost of developing application systems for use with the 
DBMS. User groups offer a pool of experienced people who 

can benefit from each other''s mistakes and successes. 

», 



5.12 Benchmarks and Prototypes 

However much a customer may learn about a DBMS through 
discussion and reading, nothing substitutes for a good 
demonstration^ If it is at all practical, an organization 
planning extensive database applications may want to develop 
benchmarks or prototypes to test whether a given product 
really does meet user requirements. ' For smaller systems a 
feature analysis or some actual experience with small proto- 
types may be sufficient. While building benchmark programs 
or constructing a prototype application may be costly, it 
may provide crucial performance results that otherwise would 
not be available until after purchase. In some cases 
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vendors may be willing to help build a sample database with 
appropriate programs to demonstrate the practical benefits 
of their particular product. Careful comparison of bench- 
mark results could be the determining factor in justifying 
the selection of one DBMS over another. A forthcoming NBS 
Special Publication discusses some alternatives to bench- 
marking [LETM83]. 



6. CONCLUSION 



People who are planning to use a DBMS need a systematic 
way to match application requirements with the features of 
.commercial produc^ts. By defining basic structures and 
operations, data models give users a tool both for analyzing 
their requirements and for comparim alternative systems. 
Ideally, it would be helpful to have precise specifications 
of tm^ data models underlying all commercial DBMS^s. In 
practid"**^ terms, however, we have to rely on models whose 
,§pecif ications are in the public domain and are therefore 
accessible to any company that wants to build a conforming 
product. ANSC X3H2 is working on two such spec i'ficat ions, 
one for net;work and another for relational databases. 

' These two models differ not only at'^the level of indi- 
vidual structures and operations, but also in their design 
"philosophies." The network model provides some complex 
inter-recoid structures that can contain a lot of valuable 
information. Structures ii\ the relational model are 
simpler, but powerful relational operations compensate for 
the lack of structural information. As a result of this 
difference in orientation, the network and relational models 
serv^ different needs. A network DBMS may be more appropri- 
ate for an organization whose applications are relatively 
stable and therefore require few structural changes. Rela- 
tional DBMS's rely so heavily on data mani^pulation that they 
can often accommodate new, unforeseen data requirements. 

While ,there are far more data models than the two 
currently in the process of standardization, most of the 
others are either structure-oriented like the network model 
or operation-oriented like the relational model. Particular 
vendors have in many cases developed tools or features that 
compensate for some of the limitatipns inherent in the data 
model underlying their products. Potential customers, how- 
ever, can still use the discussions of network and relation- 
al models presented in this report as guidance in their 
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evaluation of other mo^.els and as a caution to recognize 
that every product has its advantages and its disadvantages. 
The difficult job is still to m«tch the features of a pro- 
posed DBMS with the particular application requirements of 
the users* 
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